Evaluating the importance of different communication types in romantic tie prediction on social media

Published on Apr 1, 2018in Annals of Operations Research2.583
· DOI :10.1007/S10479-016-2295-0
Matthias Bogaert4
Estimated H-index: 4
(UGent: Ghent University),
Michel Ballings11
Estimated H-index: 11
(UT: University of Tennessee),
Dirk Van den Poel52
Estimated H-index: 52
(UGent: Ghent University)
The purpose of this paper is to evaluate which communication types on social media are most indicative for romantic tie prediction. In contrast to analyzing communication as a composite measure, we take a disaggregated approach by modeling separate measures for commenting, liking and tagging focused on an alter’s status updates, photos, videos, check-ins, locations and links. To ensure that we have the best possible model we benchmark 8 classifiers using different data sampling techniques. The results indicate that we can predict romantic ties with very high accuracy. The top performing classification algorithm is adaboost with an accuracy of up to 97.89 %, an AUC of up to 97.56 %, a G-mean of up to 81.81 %, and a F-measure of up to 81.45 %. The top drivers of romantic ties were related to socio-demographic similarity and the frequency and recency of commenting, liking and tagging on photos, albums, videos and statuses. Previous research has largely focused on aggregate measures whereas this study focuses on disaggregate measures. Therefore, to the best of our knowledge, this study is the first to provide such an extensive analysis of romantic tie prediction on social media.
📖 Papers frequently viewed together
3 Authors (Tung Nguyen, ..., Aron Culotta)
1 Citations
20183.75IEEE Access
4 Authors (Michael M. Tadesse, ..., Liang Yang)
36 Citations
55 Citations
#1Gareth M. James (SC: University of Southern California)H-Index: 27
#2Daniela Witten (UW: University of Washington)H-Index: 43
Last. Robert Tibshirani (Stanford University)H-Index: 154
view all 4 authors...
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approac...
1,873 Citations
#1Michel Ballings (UT: University of Tennessee)H-Index: 11
#2Dirk Van den Poel (UGent: Ghent University)H-Index: 52
Last. Matthias Bogaert (UGent: Ghent University)H-Index: 4
view all 3 authors...
This paper aims to create an expert system that yields an optimal strategy for increasing network size on Facebook. Data were obtained from 5488 Facebook users by means of a custom-built Facebook application. We computed a total of 426 variables. Using these data we estimated a predictive model of network size which is subsequently used in a prescriptive model. The former is estimated with Random Forest and the latter with a Genetic Algorithm. The results indicate that the proposed expert system...
13 CitationsSource
#1Matthias Bogaert (UGent: Ghent University)H-Index: 4
#2Michel Ballings (UT: University of Tennessee)H-Index: 11
Last. Dirk Van den Poel (UGent: Ghent University)H-Index: 52
view all 3 authors...
This paper seeks to assess the added value of a Facebook user's friends data in event attendance prediction over and above user data. For this purpose we gathered data of users that have liked an anonymous European soccer team on Facebook. In addition we obtained data from all their friends. In order to assess the added value of friends data we have built two models for five different algorithms (Logistic Regression, Random Forest, Adaboost, Neural Networks and Naive Bayes). The baseline model c...
19 CitationsSource
#1Ke Xu (SCUN/SCUEC: South Central University for Nationalities)H-Index: 1
#2Keju Zou (SYSU: Sun Yat-sen University)H-Index: 1
Last. Xinfang Zhang (HUST: Huazhong University of Science and Technology)H-Index: 1
view all 5 authors...
Along with the rapidly growth of mobile terminals and wireless technologies, mobile social networking services are very popular with peoples. Recently many mobile social platforms based on location-based service are developed to allow users to share their check-ins and events with friends. Check-ins data in location-based mobile social networks as well as call detail records (CDR) in mobile communication network may provide insight into community structure, relationships and members in the netwo...
36 CitationsSource
#1Robin I. M. Dunbar (University of Oxford)H-Index: 125
#2Valerio Arnaboldi (University of Oxford)H-Index: 16
Last. Andrea PassarellaH-Index: 38
view all 4 authors...
Abstract We use data on frequencies of bi-directional posts to define edges (or relationships) in two Facebook datasets and a Twitter dataset and use these to create ego-centric social networks. We explore the internal structure of these networks to determine whether they have the same kind of layered structure as has been found in offline face-to-face networks (which have a distinctively scaled structure with successively inclusive layers at 5, 15, 50 and 150 alters). The two Facebook datasets ...
218 CitationsSource
#2Juan J. RodríguezH-Index: 20
Last. Ludmila I. Kuncheva (Bangor University)H-Index: 54
view all 4 authors...
Proportions of the classes for each ensemble member are chosen randomly.Member training data: sub-sample and over-sample through SMOTE.RB-Boost combines Random Balance with AdaBoost.M2.Experiments with 86 data sets demonstrate the advantage of Random Balance. In Machine Learning, a data set is imbalanced when the class proportions are highly skewed. Imbalanced data sets arise routinely in many application domains and pose a challenge to traditional classifiers. We propose a new approach to build...
126 CitationsSource
#1Christoph Trattner (NTNU: Norwegian University of Science and Technology)H-Index: 21
#2Michael Steurer (Graz University of Technology)H-Index: 28
Existing approaches to identify the tie strength between users involve typically only one type of network. To date, no studies exist that investigate the intensity of social relations and in particular partnership between users across social networks. To fill this gap in the literature, we studied over 50 social proximity features to detect the tie strength of users defined as partnership in two different types of networks: location-based and online social networks. We compared user pairs in ter...
4 CitationsSource
#1Michel Ballings (UGent: Ghent University)H-Index: 11
#2Dirk Van den Poel (UGent: Ghent University)H-Index: 52
The purpose of this study is to (1) assess the feasibility of predicting increases in Facebook usage frequency, (2) evaluate which algorithms perform best, (3) and determine which predictors are most important. We benchmark the performance of Logistic Regression, Random Forest, Stochastic Adaptive Boosting, Kernel Factory, Neural Networks and Support Vector Machines using five times twofold cross-validation. The results indicate that it is feasible to create models with high predictive performan...
40 CitationsSource
Feb 28, 2015 in CSCW (Conference on Computer Supported Cooperative Work)
#1Jason Wiese (CMU: Carnegie Mellon University)H-Index: 13
#2Jun-Ki Min (CMU: Carnegie Mellon University)H-Index: 10
Last. John Zimmerman (CMU: Carnegie Mellon University)H-Index: 47
view all 4 authors...
How effective are call and SMS logs in modeling tie strength? Frequency and duration of communication has long been cited as a major aspect of tie strength. Intuitively, this makes sense: people communicate with those that they feel close to. Highly cited research papers have pushed this idea further, using communication as a direct proxy for tie strength. However, this operationalization has not been validated. Our work evaluates this assumption. We collected call and SMS logs and ground truth ...
49 CitationsSource
#1Jang-ho Choi (Electronics and Telecommunications Research Institute)H-Index: 2
#1Jang-Ho Choi (Electronics and Telecommunications Research Institute)H-Index: 4
Last. Changseok Bae (Electronics and Telecommunications Research Institute)H-Index: 14
view all 4 authors...
Online communications not only provide instantaneous and inexpensive mean for socialization, but also present an opportunity to understand human relationship. In this paper, we investigated correlation between human social relationship and online communications and modelled personal and business affinity as a linear combination of online interactions. We discovered that personal and business affinities have different dominant predictive variables. The discovered variables not only had a strong l...
2 CitationsSource
Cited By9
#1Lisa Schetgen (UGent: Ghent University)
#2Matthias Bogaert (UGent: Ghent University)H-Index: 4
Last. Dirk Van den Poel (UGent: Ghent University)H-Index: 52
view all 3 authors...
Abstract The purpose of this study is to demonstrate the value of Facebook data in predicting first-time donation behavior. More specifically, we provide evidence that Facebook data can be used as a valuable data source for nonprofit organizations in acquiring new donors. To do so, we evaluate three different dimensionality reduction techniques (i.e., singular value decomposition, non-negative matrix factorization, and latent Dirichlet allocation) over seven classification techniques (i.e., logi...
#1Lai-Wan Wong (Ha Tai: Xiamen University)H-Index: 4
#2Garry Wei-Han Tan (University of Kuala Lumpur)H-Index: 25
Last. Lai-Ying Leong (UTAR: Universiti Tunku Abdul Rahman)H-Index: 20
view all 5 authors...
This paper explores the characteristics of mobile social media marketing adoption in the context of digital natives via an extended Mobile Technology Acceptance Model. Specifically, mobile usefulne...
4 CitationsSource
#1Arman Hassanniakalager (University of Bath)H-Index: 3
#2Georgios Sermpinis (Glas.: University of Glasgow)H-Index: 13
Last. Thanos Verousis (University of Essex)H-Index: 7
view all 4 authors...
Abstract This study introduces a Conditional Fuzzy inference (CF) approach in forecasting. The proposed approach is able to deduct Fuzzy Rules (FRs) conditional on a set of restrictions. This conditional rule selection discards weak rules and the generated forecasts are based only on the most powerful ones. Through this process, it is capable of achieving higher forecasting performance and improving the interpretability of the underlying system. The CF concept is applied in a series of forecasti...
4 CitationsSource
This study investigates the effects of using social media for customer service on firms' reputation building. In addition, this study explores the role of absorptive capacity, ISO (International Organization for Standardization) 9,000 implementation and periodic training for management and employees in the relationship between social media–based customer service and firm reputation.,This study sampled 115 US-listed firms and collected secondary data from five databases as follows: Factiva, Fortu...
3 CitationsSource
#1Matthias Bogaert (UGent: Ghent University)H-Index: 4
#2Justine Lootens (UGent: Ghent University)H-Index: 1
Last. Michel Ballings (UT: University of Tennessee)H-Index: 11
view all 4 authors...
Abstract The objective of this paper is to evaluate multi-label classification techniques and recommender systems for cross-sell purposes in the financial services sector. We carried out three analyses using data obtained from an international financial services provider. First, we tested four multi-label classification techniques, of which the two problem transformation methods were combined with several base classifiers. Second, we benchmarked the performance of five state-of-the-art recommend...
7 CitationsSource
#1Zhi-yu Luo (SHU: Shanghai University)H-Index: 3
#2Cui Ji (SHU: Shanghai University)H-Index: 4
Last. Jiatuo Xu (SHU: Shanghai University)H-Index: 7
view all 16 authors...
Objective. In this study, machine learning was utilized to classify and predict pulse wave of hypertensive group and healthy group and assess the risk of hypertension by observing the dynamic change of the pulse wave and provide an objective reference for clinical application of pulse diagnosis in traditional Chinese medicine (TCM). Method. The basic information from 450 hypertensive cases and 479 healthy cases was collected by self-developed H20 questionnaires and pulse wave information was acq...
15 CitationsSource
#1Jia Chen (Beihang University)
#1Jia Chen (Beihang University)H-Index: 2
Last. Zhang Xiong (Beihang University)
view all 3 authors...
Social relationship recommenders aim at predicting potential useful relationships with high accuracy and efficiency, which is critically important in social network services for addressing information overload. Existing relationship recommenders mostly emphasize on friend recommendation in online social networks, which can not satisfy the requirements of industrial information systems. This work proposes an efficient latent-factor (LF)-based approach to predict multicategory relationships rather...
#1Xi Xiong (Chengdu University of Information Technology)H-Index: 12
#2Yuanyuan Li (Sichuan University)H-Index: 6
Last. Binyong Li (Chengdu University of Information Technology)
view all 7 authors...
The emotion varies and propagates with the spatial and temporal information of individuals through social media, which uncovers several interaction mechanisms and features the community structure in order to facilitate individuals’ communication and emotional contagion in social networks. Aiming to show the detailed process and characteristics of emotional contagion within social media, we propose an emotional independent cascade model in which individual emotion can affect the subsequent emotio...
24 CitationsSource
#1Matthias BogaertH-Index: 4
#2Michel BallingsH-Index: 11
Last. Dirk Van den PoelH-Index: 52
view all 4 authors...
This study assesses the feasibility of identifying self-reported sports practitioners (soccer players) on Facebook. The main goal is to develop a system to support marketers with the decision as to which prospects to target for advertising purposes. To do so, we benchmark several algorithms (i.e., random forest, logistic regression, adaboost, rotation forest, neural networks, and kernel factory) using five times twofold cross-validation. To evaluate performance and variable importances, we build...
4 CitationsSource