Identifying Soccer Players on Facebook Through Predictive Analytics

Published on Nov 16, 2017in Decision Analysis
· DOI :10.1287/DECA.2017.0354
Matthias Bogaert4
Estimated H-index: 4
,
Michel Ballings11
Estimated H-index: 11
+ 1 AuthorsDirk Van den Poel52
Estimated H-index: 52
Sources
Abstract
This study assesses the feasibility of identifying self-reported sports practitioners (soccer players) on Facebook. The main goal is to develop a system to support marketers with the decision as to which prospects to target for advertising purposes. To do so, we benchmark several algorithms (i.e., random forest, logistic regression, adaboost, rotation forest, neural networks, and kernel factory) using five times twofold cross-validation. To evaluate performance and variable importances, we build a fusion model, which combines the results of the other algorithms using the weighted average. This technique is also referred to as information-fusion sensitivity analysis. The results reveal that Facebook data provide a viable basis to come up with sports predictions as the predictive performance ranges from 72.01% to 80.43% for area under the receiver operating characteristic curve (AUC), from 81.96% to 83.95% for accuracy, and from 2.41 to 3.06 for top-decile lift. Our benchmark study indicates that stochastic...
Figures & Tables
Download
📖 Papers frequently viewed together
1 Citations
2019
3 Authors (Alaa Elsakran, ..., Ayman Alzaatreh)
References66
Newest
#1Matthias Bogaert (UGent: Ghent University)H-Index: 4
#2Michel Ballings (UT: University of Tennessee)H-Index: 11
Last. Dirk Van den Poel (UGent: Ghent University)H-Index: 52
view all 3 authors...
The purpose of this paper is to evaluate which communication types on social media are most indicative for romantic tie prediction. In contrast to analyzing communication as a composite measure, we take a disaggregated approach by modeling separate measures for commenting, liking and tagging focused on an alter’s status updates, photos, videos, check-ins, locations and links. To ensure that we have the best possible model we benchmark 8 classifiers using different data sampling techniques. The r...
9 CitationsSource
#1Asil Oztekin (University of Massachusetts Lowell)H-Index: 24
#2Recep Kizilaslan (Fatih University)H-Index: 2
Last. Ali İşeri (Gazi University)H-Index: 5
view all 4 authors...
Forecasting stock market returns is a challenging task due to the complex nature of the data. This study develops a generic methodology to predict daily stock price movements by deploying and integrating three data analytical prediction models: adaptive neuro-fuzzy inference systems, artificial neural networks, and support vector machines. The proposed approach is tested on the Borsa Istanbul BIST 100 Index over an 8 year period from 2007 to 2014, using accuracy, sensitivity, and specificity as ...
49 CitationsSource
Jul 18, 2016 in HPCS (International Conference on High Performance Computing and Simulation)
#1Eugenio Cesario (ICAR: Indian Council of Agricultural Research)H-Index: 14
Last. Paolo Trunfio (University of Calabria)H-Index: 26
view all 8 authors...
Social media posts are often tagged with geographical coordinates or other information that allows identifying user positions, this way enabling mobility pattern analysis using trajectory mining techniques. This paper presents a methodology and discusses results of a study aimed at discovering behavior and mobility patterns of Instagram users who visited EXPO 2015, the Universal Exposition hosted in Milan, Italy, from May to October 2015. We collected and analyzed geotagged posts published by ab...
17 CitationsSource
#1Jeroen D'HaenH-Index: 4
#2D. Van den PoelH-Index: 10
Last. Dries F. BenoitH-Index: 16
view all 4 authors...
Qualifying prospects as leads to contact is a complex exercise. Sales representatives often do not have the time or resources to rationally select the best leads to call. As a result, they rely on gut feeling and arbitrary rules to qualify leads. Model-based decision support systems make this process less subjective. Standard input for such an automated lead qualification system is commercial data. Commercial data, however, tends to be expensive and of ambiguous quality due to missing informatio...
28 CitationsSource
#1Matthias Bogaert (UGent: Ghent University)H-Index: 4
#2Michel Ballings (UT: University of Tennessee)H-Index: 11
Last. Dirk Van den Poel (UGent: Ghent University)H-Index: 52
view all 3 authors...
This paper seeks to assess the added value of a Facebook user's friends data in event attendance prediction over and above user data. For this purpose we gathered data of users that have liked an anonymous European soccer team on Facebook. In addition we obtained data from all their friends. In order to assess the added value of friends data we have built two models for five different algorithms (Logistic Regression, Random Forest, Adaboost, Neural Networks and Naive Bayes). The baseline model c...
19 CitationsSource
#1Michel Ballings (UT: University of Tennessee)H-Index: 11
#2Dirk Van den Poel (UGent: Ghent University)H-Index: 52
Last. Ruben Gryp (UGent: Ghent University)H-Index: 1
view all 4 authors...
We predict long term stock price direction.We benchmark three ensemble methods against four single classifiers.We use five times twofold cross-validation and AUC as a performance measure.Random Forest is the top algorithm.This study is the first to make such an extensive benchmark in this domain. Stock price direction prediction is an important issue in the financial world. Even small improvements in predictive performance can be very profitable. The purpose of this paper is to benchmark ensembl...
175 CitationsSource
#1Pengfei Wei (NPU: Northwestern Polytechnical University)H-Index: 14
#2Zhenzhou Lu (NPU: Northwestern Polytechnical University)H-Index: 23
Last. Jingwen Song (NPU: Northwestern Polytechnical University)H-Index: 11
view all 3 authors...
Measuring variable importance for computational models or measured data is an important task in many applications. It has drawn our attention that the variable importance analysis (VIA) techniques were developed independently in many disciplines. We are strongly aware of the necessity to aggregate all the good practices in each discipline, and compare the relative merits of each method, so as to instruct the practitioners to choose the optimal methods to meet different analysis purposes, and to ...
180 CitationsSource
#1Eugenio Cesario (ICAR: Indian Council of Agricultural Research)H-Index: 14
#2Chiara CongedoH-Index: 1
Last. Carlo TurriH-Index: 1
view all 8 authors...
The world-wide size of social networks, such as Facebook and Twitter, is making possible to analyse the realtime behaviour of large groups of people, such those attending popular events. This paper presents work and results on the analysis of geotagged tweets carried out to understand the behaviour of people attending the 2014 FIFA World Cup. We monitored the Twitter users attending the World Cup matches to discover the most frequent movements of fans during the competition. The data source is r...
17 CitationsSource
#1Michel Ballings (UGent: Ghent University)H-Index: 11
#2Dirk Van den Poel (UGent: Ghent University)H-Index: 52
The purpose of this study is to (1) assess the feasibility of predicting increases in Facebook usage frequency, (2) evaluate which algorithms perform best, (3) and determine which predictors are most important. We benchmark the performance of Logistic Regression, Random Forest, Stochastic Adaptive Boosting, Kernel Factory, Neural Networks and Support Vector Machines using five times twofold cross-validation. The results indicate that it is feasible to create models with high predictive performan...
40 CitationsSource
#1Kevin Filo (Griffith University)H-Index: 20
#2Daniel Lock (Griffith University)H-Index: 16
Last. Adam Karg (Griffith University)H-Index: 13
view all 3 authors...
The emergence of social media has profoundly impacted the delivery and consumption of sport. In the current review we analysed the existing body of knowledge of social media in the field of sport management from a service-dominant logic perspective, with an emphasis on relationship marketing. We reviewed 70 journal articles published in English-language sport management journals, which investigated new media technologies facilitating interactivity and co-creation that allow for the development a...
201 CitationsSource
Cited By4
Newest
#1Matthias Bogaert (UGent: Ghent University)H-Index: 4
#2Michel Ballings (UT: University of Tennessee)H-Index: 11
Last. Dirk Van den Poel (UGent: Ghent University)H-Index: 52
view all 4 authors...
The main purpose of this paper is to evaluate the feasibility of predicting whether yes or no a Facebook user has self-reported to have watched a given movie genre. Therefore, we apply a data analytical framework that (1) builds and evaluates several predictive models explaining self-declared movie watching behavior, and (2) provides insight into the importance of the predictors and their relationship with self-reported movie watching behavior. For the first outcome, we benchmark several algorit...
2 CitationsSource
#1Lisa Schetgen (UGent: Ghent University)
#2Matthias Bogaert (UGent: Ghent University)H-Index: 4
Last. Dirk Van den Poel (UGent: Ghent University)H-Index: 52
view all 3 authors...
Abstract The purpose of this study is to demonstrate the value of Facebook data in predicting first-time donation behavior. More specifically, we provide evidence that Facebook data can be used as a valuable data source for nonprofit organizations in acquiring new donors. To do so, we evaluate three different dimensionality reduction techniques (i.e., singular value decomposition, non-negative matrix factorization, and latent Dirichlet allocation) over seven classification techniques (i.e., logi...
Source
#1Sangjae LeeH-Index: 22
#2Kun Chang LeeH-Index: 1
Last. Joon Yeon ChoehH-Index: 1
view all 3 authors...
The enormous volume and largely varying quality of available reviews provide a great obstacle to seek out the most helpful reviews. While Naive Bayesian Network (NBN) is one of the matured artificial intelligence approaches for business decision support, the usage of NBN to predict the helpfulness of online reviews is lacking. This study intends to suggest HPNBN (a helpfulness prediction model using NBN), which adopts NBN for helpfulness prediction. This study crawled sample data from Amazon web...
1 CitationsSource
#1Matthias Bogaert (UGent: Ghent University)H-Index: 4
#2Justine Lootens (UGent: Ghent University)H-Index: 1
Last. Michel Ballings (UT: University of Tennessee)H-Index: 11
view all 4 authors...
Abstract The objective of this paper is to evaluate multi-label classification techniques and recommender systems for cross-sell purposes in the financial services sector. We carried out three analyses using data obtained from an international financial services provider. First, we tested four multi-label classification techniques, of which the two problem transformation methods were combined with several base classifiers. Second, we benchmarked the performance of five state-of-the-art recommend...
7 CitationsSource