Data-intensive applications, challenges, techniques and technologies: A survey on Big Data

Published on Aug 10, 2014in Information Sciences6.795
路 DOI :10.1016/J.INS.2014.01.015
C. L. Philip Chen81
Estimated H-index: 81
(UM: University of Macau),
Chun-Yang Zhang7
Estimated H-index: 7
(UM: University of Macau)
It is already true that Big Data has drawn huge attention from researchers in information sciences, policy and decision makers in governments and enterprises. As the speed of information growth exceeds Moore鈥檚 Law at the beginning of this new century, excessive data is making great troubles to human beings. However, there are so much potential and highly useful values hidden in the huge volume of data. A new scientific paradigm is born as dataintensive scientific discovery (DISD), also known as Big Data problems. A large number of fields and sectors, ranging from economic and business activities to public administration, from national security to scientific researches in many areas, involve with Big Data problems. On the one hand, Big Data is extremely valuable to produce productivity in businesses and evolutionary breakthroughs in scientific disciplines, which give us a lot of opportunities to make great progresses in many fields. There is no doubt that the future competitions in business productivity and technologies will surely converge into the Big Data explorations. On the other hand, Big Data also arises with many challenges, such as difficulties in data capture, data storage, data analysis and data visualization. This paper is aimed to demonstrate a close-up view about Big Data, including Big Data applications, Big Data opportunities and challenges, as well as the state-of-the-art techniques and technologies we currently adopt to deal with the Big Data problems. We also discuss several underlying methodologies to handle the data deluge, for example, granular computing, cloud computing, bio-inspired computing, and quantum computing.
Figures & Tables
馃摉 Papers frequently viewed together
2013HICSS: Hawaii International Conference on System Sciences
4 Authors (J.A. Espinosa)
#1Ross MistryH-Index: 1
#2Stacia MisnerH-Index: 1
NOTE: This title is also available as a free eBook on the Microsoft Download Center. It is offered for sale in print format as a convenience. Get a head start evaluating SQL Server 2014 - guided by two experts who have worked with the technology from the earliest beta. Based on Community Technology Preview 2 (CTP2) software, this guide introduces new features and capabilities, with practical insights on how SQL Server 2014 can meet the needs of your business. Get the early, high-level overview y...
#2Mauro ColiH-Index: 2
view all 3 authors...
The theme of the meeting was Statistical Methods for the Analysis of Large Data-Sets. In recent years there has been increasing interest in this subject; in fact a huge quantity of information is often available but standard statistical techniques are usually not well suited to managing this kind of data. The conference serves as an important meeting point for European researchers working on this topic and a number of European statistical societies participated in the organization of the event. ...
#1Jiawei Yuan (UALR: University of Arkansas at Little Rock)H-Index: 14
#2Shucheng Yu (UALR: University of Arkansas at Little Rock)H-Index: 34
To improve the accuracy of learning result, in practice multiple parties may collaborate through conducting joint Back-Propagation neural network learning on the union of their respective data sets. During this process no party wants to disclose her/his private data to others. Existing schemes supporting this kind of collaborative learning are either limited in the way of data partition or just consider two parties. There lacks a solution that allows two or more parties, each with an arbitrarily...
#1Xixian Han (HIT: Harbin Institute of Technology)H-Index: 7
#2Jianzhong Li (HIT: Harbin Institute of Technology)H-Index: 48
Last. Jinbao Wang (HIT: Harbin Institute of Technology)H-Index: 8
view all 4 authors...
Skyline is an important operation in many applications to return a set of interesting points from a potentially huge data space. Given a table, the operation finds all tuples that are not dominated by any other tuples. It is found that the existing algorithms cannot process skyline on big data efficiently. This paper presents a novel skyline algorithm SSPL on big data. SSPL utilizes sorted positional index lists which require low space overhead to reduce I/O cost significantly. The sorted positi...
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. ...
#1Trevor Hastie (UNSW: University of New South Wales)H-Index: 127
#2Robert TibshiraniH-Index: 157
Last. Jerome H. FriedmanH-Index: 72
view all 3 authors...
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This...
#1Cheng ChenH-Index: 2
#2Zhong Liu (National University of Defense Technology)H-Index: 1
Last. Kai Wang (National University of Defense Technology)H-Index: 7
view all 5 authors...
With the availability of increasingly more new data sources collected for transportation in recent years, the computational effort for traffic flow forecasting in standalone modes has become increasingly demanding for large-scale networks. Distributed modeling strategies can be utilized to reduce the computational effort. In this paper, we present a MapReduce-based approach to processing distributed data to design a MapReduce framework of a traffic forecasting system, including its system archit...
#1Abzetdin Adamov (Qafqaz University)H-Index: 4
The extremely fast grow of Internet Services, Web and Mobile Applications and advance of the related Pervasive, Ubiquity and Cloud Computing concepts have stumulated production of tremendous amounts of data available online. Event with the power of today's modern computers it still big challenge for business and government organizations to manage, search, analyze, and visualize this vast amount of data as information. Data-Intensive computing which is intended to address this problems become qui...
Dec 1, 2012 in SMC (Systems, Man and Cybernetics)
#1Qi Zhou (University of Portsmouth)H-Index: 14
#2Peng Shi (University of South Wales)H-Index: 151
Last. Shengyuan Xu (Nanjing University of Science and Technology)H-Index: 88
view all 4 authors...
This paper focuses on the problem of neural-network-based decentralized adaptive output-feedback control for a class of nonlinear strict-feedback large-scale stochastic systems. The dynamic surface control technique is used to avoid the explosion of computational complexity in the backstepping design process. A novel direct adaptive neural network approximation method is proposed to approximate the unknown and desired control input signals instead of the unknown nonlinear functions. It is shown ...
#1Vincenzo GulisanoH-Index: 15
Last. Patrick Valduriez (IRIA: French Institute for Research in Computer Science and Automation)H-Index: 57
view all 5 authors...
Many applications in several domains such as telecommunications, network security, large-scale sensor networks, require online processing of continuous data flows. They produce very high loads that requires aggregating the processing capacity of many nodes. Current Stream Processing Engines do not scale with the input load due to single-node bottlenecks. Additionally, they are based on static configurations that lead to either under or overprovisioning. In this paper, we present StreamCloud, a s...
Cited By1717
Abstract null null Research shows that big data analytics capability (BDAC) is a major determinant of firm performance. However, scant research has theoretically articulated and empirically tested the mechanisms and conditions under which BDAC influences performance. This study advances existing knowledge on the BDAC鈥損erformance relationship by drawing on the knowledge-based view and contingency theory to argue that how and when BDAC influences market performance is dependent on the intervening ...
#2Heman Pathak (Gurukul Kangri Vishwavidyalaya)H-Index: 1
#1Jose Peinado (Universidad Aut贸noma de Ciudad Ju谩rez)
#2Alberto Ochoa (Universidad Aut贸noma de Ciudad Ju谩rez)H-Index: 9
Last. Sara Paiva (Polytechnic Institute of Viana do Castelo)H-Index: 7
view all 3 authors...
Intangible assets are currently present in administrations and organizations. As an institution, the administration of a business consortium makes intensive use of people and knowledge, which are the basis of the Knowledge Management System (QMS). On the other hand, services, essentially intangible, are the main product that the institution generates. Successful knowledge management will contribute to: cost reduction, reusing knowledge, and disseminating best practices; promotion of staff traini...
#1Shaukat AliH-Index: 12
#2Shah KhusroH-Index: 10
view all 4 authors...
#1Pedro MartinsH-Index: 12
#2Filipe S谩H-Index: 6
Last. Maryam AbbasiH-Index: 6
view all 4 authors...
#1Mahmoud A. Mahdi (Zagazig University)H-Index: 4
#2Khalid M. Hosny (Zagazig University)H-Index: 20
Last. Ibrahim El-Henawy (Zagazig University)H-Index: 15
view all 3 authors...
Abstract null null In some situations, finding the rare association rule is of higher importance than the frequent itemset. Unique rules represent rare cases, activities, or events in real-world applications. It is essential to extract exceptional critical activity from vast routine data. This paper proposes a new algorithm called FR-Tree to mine the association rules and produce essential rules. This work aims to demonstrate that this algorithm is suitable for extracting rare association rules ...
#1Francesca IandoloH-Index: 8
#2Francesca LoiaH-Index: 3
Last. Francesco Caputo (University of Naples Federico II)H-Index: 21
view all 5 authors...
The increasing fluidity of social and business configurations made possible by the opportunities provided by the World Wide Web and the new technologies is questioning the validity of consolidated business models and managerial approaches. New rules are emerging and multiple changes are required to both individuals and organizations engaged in dynamic and unpredictable paths. In such a scenario, the paper aims at describing the potential role of big data and artificial intelligence in the path t...
#1Bojun Yin (China University of Geosciences (Wuhan))H-Index: 1
#2Renguang Zuo (China University of Geosciences (Wuhan))H-Index: 31
Last. Weigang YangH-Index: 1
view all 5 authors...
Abstract null null We have entered the fourth research paradigm with the overwhelming availability of vast amounts of data. The processing and mining these data for a better understanding of earth systems and predicting mineral resources is challenging. This study discusses a data-driven knowledge discovery of geochemical patterns and presents a case study of geochemical data processing from a data-driven perspective. We employed local indicators of spatial association (LISA), principal componen...
#1Lingzi HongH-Index: 7
#2William E. MoenH-Index: 14
view all 4 authors...
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.