Synthetic Data Generation and Evaluation

Date:

Data privacy has recently become a hot topic in the news thanks to failures in security and concerns about how companies are using the personal data they collect about their customers or users. Facebook, for instance, faced scrutiny over its handling of consumer data both in the U.S. and in the U.K. Facing the above issues, the generation of synthetic data is becoming a fundamental task in the daily life of any organization. Synthetic data is directly and separately generated from an original data. The generated data should be realistic in certain aspects, like format, distribution of attributes, relationship among attributes, etc; and could provide the similar results when performing data analytics on both datasets. In this presentation, we will first present recent research to generate synthetic data, and then empirical methods to evaluate the similarity of the generated data.

More information here

Download slides here

Recommended citation: Duc-Phong, Le. (2020). “Synthetic Data Generation and Evaluation” The 2020 Serene-risc Workshop on The State of Canadian Cybersecurity.