Ernesto Damiani
Bio:
Ernesto Damiani is the Director of the Information Security Research Center at Khalifa University, Abu Dhabi, and the leader of the Big Data Initiative at the Etisalat British Telecom Innovation Center (EBTIC) . Ernesto is on extended leave from the Department of Computer Science, Università degli Studi di Milano, Italy, where he leads the SESAR research lab and coordinates several large scale research projects funded by the European Commission, the Italian Ministry of Research and by private companies such as British Telecom, Cisco Systems, SAP, Telecom Italia and many others. Ernesto’s research interests include business process analysis and privacy-preserving Big Data analytics. Ernesto is the Principal Investigator of the TOREADOR H2020 project on models and tools for Big data-as-a-service.
website: http://olaf.crema.unimi.it/
Controlling Leakage and Disclosure Risk in Semantic Big Data pipelines
In many Big Data environments, information is made available as huge data streams, collected and analyzed at different locations, asynchronously and under the responsibility of different authorities. It has become common for data analysts to have a mandate for computing Big Data analytics without holding the rights to access the individual data points in the input, as they may contain sensitive information or personal data protected by privacy regulations.
This talk discusses the idea that techniques used for semantic enrichment of Big Data (such as semantic lifting to harmonize metadata representation across data collection points and pre-joins at data ingestion time to avoid computing semantic joins on Big Data storage) can be seen as non-linear leakage and privacy risk boosters.
Intuition suggests that semantic techniques applied to Big Data representation may have a double impact on security risks:
(1) increase leakage risk by increasing the value for the attacker per unit of information leaked
(2) increase intrusion risk, making injection attacks (i.e. attacks aimed at poisoning data for subverting the outcome of analytics) more effective per unit of poisoned information injected .
However, no clear methodology is currently available for quantifying the impact of these boosters. This talk will discuss a (semi-)quantitative technique for computing Big Data leakage risk estimates, in order to meaningfully compare them with the quantifiable benefits of semantic enrichment. Also, it will discuss a model and a toolkit for protecting semantically enriched data streams based on the idea of dynamic filters, incrementally built based on the applicable Access Control policy and on the analytics to be performed.