Privacy and Uniqueness of Neighborhoods in Social Networks.

Published on Sep 21, 2020in arXiv: Social and Information Networks
Daniele Romanini1
Estimated H-index: 1
,
Sune Lehmann36
Estimated H-index: 36
(DTU: Technical University of Denmark),
Mikko Kivelä19
Estimated H-index: 19
(TKK: Helsinki University of Technology)
Sources
Abstract
The ability to share social network data at the level of individual connections is beneficial to science: not only for reproducing results, but also for researchers who may wish to use it for purposes not foreseen by the data releaser. Sharing such data, however, can lead to serious privacy issues, because individuals could be re-identified, not only based on possible nodes' attributes, but also from the structure of the network around them. The risk associated with re-identification can be measured and it is more serious in some networks than in others. Various optimization algorithms have been proposed to anonymize the network while keeping the number of changes minimal. However, existing algorithms do not provide guarantees on where the changes will be made, making it difficult to quantify their effect on various measures. Using network models and real data, we show that the average degree of networks is a crucial parameter for the severity of re-identification risk from nodes' neighborhoods. Dense networks are more at risk, and, apart from a small band of average degree values, either almost all nodes are re-identifiable or they are all safe. Our results allow researchers to assess the privacy risk based on a small number of network statistics which are available even before the data is collected. As a rule-of-thumb, the privacy risks are high if the average degree is above 10. Guided by these results we propose a simple method based on edge sampling to mitigate the re-identification risk of nodes. Our method can be implemented already at the data collection phase. Its effect on various network measures can be estimated and corrected using sampling theory. These properties are in contrast with previous methods arbitrarily biasing the data. In this sense, our work could help in sharing network data in a statistically tractable way.
References68
Newest
#1Alain Barrat (AMU: Aix-Marseille University)H-Index: 73
#2Ciro Cattuto (UNITO: University of Turin)H-Index: 49
Last. Jari Saramäki (Aalto University)H-Index: 47
view all 5 authors...
In the fight against the COVID-19 pandemic, lockdowns have succeeded in limiting contagions in many countries, at however heavy societal costs: more targeted non-pharmaceutical interventions are desirable to contain or mitigate potential resurgences. Contact tracing, by identifying and quarantining people who have been in prolonged contact with an infectious individual, has the potential to stop the spread where and when it occurs, with thus limited impact. The limitations of manual contact trac...
Source
#1Josh A. Firth (University of Oxford)H-Index: 23
#2Joel Hellewell (Lond: University of London)H-Index: 23
Last. Lewis G. Spurgin (UEA: University of East Anglia)H-Index: 23
view all 6 authors...
Case isolation and contact tracing can contribute to the control of COVID-19 outbreaks. However, it remains unclear how real-world networks could influence the effectiveness and efficiency of such approaches. To address this issue, we simulated control strategies for SARS-CoV-2 in a real-world social network generated from high resolution GPS data. We found that tracing contacts-of-contacts reduced the size of simulated outbreaks more than tracing of only contacts, but resulted in almost one thi...
Source
#1Aili Asikainen (Aalto University)H-Index: 3
#2Gerardo IñiguezH-Index: 13
Last. Mikko Kivelä (Aalto University)H-Index: 19
view all 5 authors...
Social network structure has often been attributed to two network evolution mechanisms—triadic closure and choice homophily—which are commonly considered independently or with static models. However, empirical studies suggest that their dynamic interplay generates the observed homophily of real-world social networks. By combining these mechanisms in a dynamic model, we confirm the longheld hypothesis that choice homophily and triadic closure cause induced homophily. We estimate how much observed...
Source
#1Tsuyoshi Miyakawa (Fujita Health University)H-Index: 64
A reproducibility crisis is a situation where many scientific studies cannot be reproduced. Inappropriate practices of science, such as HARKing, p-hacking, and selective reporting of positive results, have been suggested as causes of irreproducibility. In this editorial, I propose that a lack of raw data or data fabrication is another possible cause of irreproducibility.As an Editor-in-Chief of Molecular Brain, I have handled 180 manuscripts since early 2017 and have made 41 editorial decisions ...
Source
#1Fabian Baumann (Humboldt University of Berlin)H-Index: 3
#2Philipp Lorenz-Spreen (MPG: Max Planck Society)H-Index: 5
Last. Michele Starnini (Institute for Scientific Interchange)H-Index: 16
view all 4 authors...
Echo chambers and opinion polarization have been recently quantified in several sociopolitical contexts, across different social media, raising concerns for the potential impact on the spread of misinformation and the openness of debates. Despite increasing efforts, the dynamics leading to the emergence of these phenomena remain unclear. Here, we propose a model that introduces the phenomenon of radicalization, as a reinforcing mechanism driving the evolution to extreme opinions from moderate in...
Source
#1Piotr Sapiezynski (DTU: Technical University of Denmark)H-Index: 16
#2Arkadiusz Stopczynski (DTU: Technical University of Denmark)H-Index: 16
Last. Sune Lehmann (DTU: Technical University of Denmark)H-Index: 36
view all 4 authors...
We describe the multi-layer temporal network which connects a population of more than 700 university students over a period of four weeks. The dataset was collected via smartphones as part of the Copenhagen Networks Study. We include the network of physical proximity among the participants (estimated via Bluetooth signal strength), the network of phone calls (start time, duration, no content), the network of text messages (time of message, no content), and information about Facebook friendships....
Source
#1Eun Lee (UNC: University of North Carolina at Chapel Hill)H-Index: 7
#2Fariba KarimiH-Index: 13
Last. Mirta Galesic (SFI: Santa Fe Institute)H-Index: 33
view all 6 authors...
People’s perceptions about the size of minority groups in social networks can be biased, often showing systematic over- or underestimation. These social perception biases are often attributed to biased cognitive or motivational processes. Here we show that both over- and underestimation of the size of a minority group can emerge solely from structural properties of social networks. Using a generative network model, we show that these biases depend on the level of homophily, its asymmetric nature...
Source
Real social network datasets provide significant benefits for understanding phenomena such as information diffusion or network evolution. Yet the privacy risks raised from sharing real graph datasets, even when stripped of user identity information, are significant. Previous research shows that many graph anonymization techniques fail against existing graph de-anonymization attacks. However, the specific reason for the success of such de-anonymization attacks is yet to be understood. This paper ...
#3Philip S. Yu (UIC: University of Illinois at Chicago)H-Index: 154
Deep neural networks (DNNs) have been widely applied in various applications involving image, text, audio, and graph data. However, recent studies have shown that DNNs are vulnerable to adversarial attack. Though there are several works studying adversarial attack and defense on domains such as images and text processing, it is difficult to directly transfer the learned knowledge to graph data due to its representation challenge. Given the importance of graph analysis, increasing number of works...
#1Xiaoyun WangH-Index: 7
#2Joe EatonH-Index: 4
Last. Felix WuH-Index: 18
view all 4 authors...
Graph convolutional networks (GCNs) have been widely used for classifying graph nodes in the semi-supervised setting. Previous work have shown that GCNs are vulnerable to the perturbation on adjacency and feature matrices of existing nodes. However, it is unrealistic to change existing nodes in many applications, such as existing users in social networks. In this paper, we design algorithms to attack GCNs by adding fake nodes. A greedy algorithm is proposed to generate adjacency and feature matr...
Cited By1
Newest
#1Antoine BoutetH-Index: 12
#2Sonia Ben MokhtarH-Index: 21
Last. Vincent PrimaultH-Index: 8
view all 3 authors...
The widespread adoption of handheld devices (e.g.,smartphones, tablets) makes mobility traces of users broadlyavailable to third party services. These traces are collected bymeans of various sensors embedded in the users’ devices, includ-ing GPS, WiFi and GSM. We study in this paper the mobility of300 users over a period up to 31 months from the perspective ofthe above three types of data and with a focus on two cities, i.e.,Lausanne (Switzerland) and Lyon (France). We found that users’mobility ...
Source
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.