数据

The promise of synthetic data

In an algorithm-driven world where data is king, one mis-step can lead to a royal mess. Netflix discovered this in 2009 when it released anonymised movie reviews penned by subscribers. By crossmatching those snippets with reviews on another website, data sleuths revealed they could identify individual subscribers and what they had been watching. A gay customer sued for breach of privacy; Netflix settled.

That episode is still cited today by academics seeking ways of sifting useful information from data without outing the individuals who provide it. Where anonymisation failed, synthetic data might yet succeed.

It is, as its name suggests, artificially generated. It is most often created by funnelling real-world data through a noise-adding algorithm to construct a new data set. The resulting data set captures the statistical features of the original information without being a giveaway replica. Its usefulness hinges on a principle known as differential privacy: that anybody mining synthetic data could make the same statistical inferences as they would from the true data — without being able to identify individual contributions.

您已阅读29%(1141字),剩余71%(2763字)包含更多重要信息,订阅以继续探索完整内容,并享受更多专属服务。
版权声明:本文版权归manbetx20客户端下载 所有,未经允许任何单位或个人不得转载,复制或以任何其他方式使用本文全部或部分,侵权必究。
设置字号×
最小
较小
默认
较大
最大
分享×