|Name:||AI-Based Privacy-Preserving Big Data Sharing for Market Research (Anonymous Big Data)|
|Call:||ICT of the Future, 6th Call 2017|
|Duration:||26 months from 01.10.2019 to 30.11.2021.|
|Total Funding:||671 433 €|
The ANITA (ANonymous bIg daTA) project is a research project funded by the Austrian Research Promotion Agency (ICT of the Future, 6th Call 2017).
The goal of this research project is, for the first time, to train deep generative model architectures to sequential personal data while providing differential privacy guarantees, in order to systematically validate the feasibility of using synthetic, privacy-preserving sequential data for third party market research. ANITA aims to prepare the ground for developing general-purpose anonymization solutions that also work for high-dimensional data.
In ANITA we are going to:
- collect and analyze use cases for privacy-sensitive sequential data;
- conduct a literature review of generative deep neural network architectures and of privacy guarantees for deep learning;
- design and create a virtual data lab that allows to systematically investigate the conditions under which a variety of deep generative models are able to derive synthetic replicas which capture structure and correlations, while protecting individual-level privacy;
- implement and test the selected model architectures; and
- report the results of the simulation study and of the empirical use case validations.
The consortium partners are the Institute for Service Marketing at the Vienna University of Economics and Business, the Mostly AI Solutions MP GmbH, the George Labs GmbH and the Statistics Austria.
WP1 – Project Management
The general objective of this work package is to support the scientific and technical progress in the best way, so that the main concentration stays on the substantive contribution to the project and the efficient and timely implementation of the work package tasks.
WP2 – Requirements Analysis
The goal of the WP2 is to systematically collect use cases for sharing privacy-sensitive sequential data with third parties as well as to capture requirements with respect to accuracy and privacy.
The use cases will be documented in terms of
- number of subjects;
- number and characteristics of shared attributes;
- frequency / latency for data sharing;
- accuracy & privacy requirements;
- technical requirements; and
- expected (business) impact if the data are anonymized.
WP3 – Generative Deep Neural Network Architectures
WP3’s goal is to gain an overview of generative deep neural network architectures that are deemed capable of preserving individual-level as well as population-level information within sequential personal data, and thus could be used for data anonymization.
In addition to the literature review on generative deep neural network architectures, a literature review will be conducted on the subject of differential privacy guarantees for the training of deep learning models.
WP4 – Virtual Data Lab
WP4’s goal is to setup and run a virtual data lab environment, that can be used for experimentation by generating artificial datasets to be used for validating accuracy and privacy.
A virtual data lab environment will include:
- a flexible data factory for generating a variety of artificial sequential datasets;
- a GPU cloud compute setup; and
- validation tests, measures and visual reports for assessing retained information; and
- validation tests and measures for assessing privacy guarantees.
The design of the data lab will be closely aligned with the results of the use case and requirement analysis. Any developed source code relating to the virtual data lab will be open-sourced.
Once the WP5 models have been developed, the data lab will be continuously used to:
- generate artificial dataset given model assumptions and parameters;
- train models with given hyper parameters to fit the data;
- use the models to generate synthetic datasets;
- assess the retained information within the synthetic datasets;
- assess any disclosed individual-level information within the synthetic datasets; and
- record runtime and used compute resources.
WP5 – Model Development
WP5’s goal is to provide reference implementations of existing generative deep neural network architectures, including privacy preserving techniques, and as well refine existing architectures for meeting the captured requirements towards synthetic data.
Based on the outcome of WP3, a list of already published deep neural network architectures will be implemented on top of an established deep learning framework. If available, the correctness of the implementation will be established against published reference results. All implementations will be coupled with various types of privacy-preserving mechanisms, to be able to limit the amount of information that is retained about individual subjects. The software will be designed to be compatible with the virtual data lab, and in particular allow easy tuning of various parameters, in order to quickly iterate on different choices hyper parameters and network settings. Further, existing model architectures could be refined in order to meet the requirements established by WP2.
WP6 – Empirical Validation
WP6’s goal is to validate the feasibility of using synthetic data in lieu of actual data for market research purposes with actual empirical use cases provided by the consortium partners.
George Labs and Statistics Austria will each provide an actual use case, that will be used for validating the feasibility of the approach with actual data. The appropriate model architectures will be selected based on the findings of the simulation study, and then, these models will be trained on the actual data. Subsequently, these models will used for generating synthetic, privacy-preserving datasets that will be assessed by the data providers.
WP7 – Exploitation & Dissemination
WP7’s goal is to assist in the discussion and documentation the developed technology as well as to disseminate the findings of this project within the relevant academic and industry audience.
To help disseminate the findings, the consortium partners will initiate and participate in a critical discourse on the project findings with all relevant academic and industry audiences. This includes a continuous updating of working papers, the documentation and distribution of findings via a project webpage and the participation in relevant (academic and industry) conferences and workshops.
Vienna University of Economics and Business (WU) is one of Europe’s largest, most modern universities of business and economics. The high quality of its research and teaching is confirmed by three of the most renowned international accreditations – EQUIS, AACSB and AMBA.
With nearly 23,000 students from about 110 countries, WU is one of the largest educational institutions for business, economics, and the social sciences in Europe.
MOSTLY AI is a Vienna, Austria based high-tech startup that has developed game-changing AI technology for synthetic data.
Mostly AI’s solutions enable organizations across the world and across industries to safely share big data assets, internally as well as externally, while keeping the privacy of their customers fully protected.
This breakthrough in data protection is made possible by leveraging generative deep neural networks that extract patterns, structures and variations from existing data to generate highly realistic & highly accurate synthetic customers.
Mostly AI’s international team of data enthusiasts takes pride in offering the world’s most advanced synthetic data solutions, and thus enabling a big data ecosystem where privacy is truly respected.
The George Labs of Erste Group is an international and cross-disciplinary squad of people who are passionate about a great banking experience. George Labs consists of designers and developers, analysts and engineers, product owners and product managers, architects, writers, coaches, testers and, bankers.
Statistics Austria is an independent and non-profit-making federal institution under public law. It is the role of Statistics Austria to provide reliably collected and expertly analysed political, social and economic information. With EU entry, another function of Statistics Austria has gained in importance from the users’ point of view: its function to mediate between Austria and EU data as well as to co-ordinate the pan-European harmonisation process.
Please contact Olha Drozd for further information, collaborations, and feedback.