A survey on data storage and placement methodologies for Cloud-Big Data ecosystem

Published on Feb 1, 2019in Journal of Big Data
· DOI :10.1186/S40537-019-0178-3
Somnath Mazumdar8
Estimated H-index: 8
(Simula Research Laboratory),
Daniel Seybold8
Estimated H-index: 8
(University of Ulm)
+ 1 AuthorsYiannis Verginadis11
Estimated H-index: 11
Currently, the data to be explored and exploited by computing systems increases at an exponential rate. The massive amount of data or so-called “Big Data” put pressure on existing technologies for providing scalable, fast and efficient support. Recent applications and the current user support from multi-domain computing, assisted in migrating from data-centric to knowledge-centric computing. However, it remains a challenge to optimally store and place or migrate such huge data sets across data centers (DCs). In particular, due to the frequent change of application and DC behaviour (i.e., resources or latencies), data access or usage patterns need to be analyzed as well. Primarily, the main objective is to find a better data storage location that improves the overall data placement cost as well as the application performance (such as throughput). In this survey paper, we are providing a state of the art overview of Cloud-centric Big Data placement together with the data storage methodologies. It is an attempt to highlight the actual correlation between these two in terms of better supporting Big Data management. Our focus is on management aspects which are seen under the prism of non-functional properties. In the end, the readers can appreciate the deep analysis of respective technologies related to the management of Big Data and be guided towards their selection in the context of satisfying their non-functional application requirements. Furthermore, challenges are supplied highlighting the current gaps in Big Data management marking down the way it needs to evolve in the near future.
Figures & Tables
📖 Papers frequently viewed together
34 Citations
2012VLDB: Very Large Data Bases
6 Authors (Tilmann Rabl, ..., Serge Mankovskii)
192 Citations
1 Author (Rui Santos)
4 Citations
#1Achilleas Achilleos (UCY: University of Cyprus)H-Index: 11
#2Kyriakos KritikosH-Index: 16
Last. George A. Papadopoulos (UCY: University of Cyprus)H-Index: 29
view all 11 authors...
Cloud computing offers a flexible pay-as-you-go model for provisioning application resources, which enables applications to scale on-demand based on the current workload. In many cases, though, users face the single vendor lock-in effect, missing opportunities for optimal and adaptive application deployment across multiple clouds. Several cloud modelling languages have been developed to support multi-cloud resource management, but still they lack holistic cloud management of all aspects and phas...
18 CitationsSource
#1Daniel Seybold (University of Ulm)H-Index: 8
#2Moritz Keppler (Daimler AG)H-Index: 1
Last. Jörg Domaschka (University of Ulm)H-Index: 12
view all 4 authors...
Big Data and IoT applications require highly-scalable database management system (DBMS), preferably operated in the cloud to ensure scalability also on the resource level. As the number of existing distributed DBMS is extensive, the selection and operation of a distributed DBMS in the cloud is a challenging task. While DBMS benchmarking is a supportive approach, existing frameworks do not cope with the runtime constraints of distributed DBMS and the volatility of cloud environments. Hence, DBMS ...
8 CitationsSource
#1Yiannis Verginadis (NTUA: National Technical University of Athens)H-Index: 11
#2Ioannis Patiniotakis (NTUA: National Technical University of Athens)H-Index: 10
Last. Gregoris Mentzas (NTUA: National Technical University of Athens)H-Index: 31
view all 3 authors...
Cloud computing has been recognized as the most prominent way for hosting and delivering services over the Internet. A plethora of cloud service offerings are currently available and are being rapidly adopted by small and medium enterprises but also by larger organisations based on their many superiorities to traditional computing models. However, at the same time the computing requirements of the modern cloud application has been exponentially increased due to the available big data for process...
6 CitationsSource
#1Ali Davoudian (Carleton University)H-Index: 3
#2Liu Chen (Carleton University)H-Index: 1
Last. Mengchi Liu (Carleton University)H-Index: 15
view all 3 authors...
Recent demands for storing and querying big data have revealed various shortcomings of traditional relational database systems. This, in turn, has led to the emergence of a new kind of complementary nonrelational data store, named as NoSQL. This survey mainly aims at elucidating the design decisions of NoSQL stores with regard to the four nonorthogonal design principles of distributed database systems: data model, consistency model, data partitioning, and the CAP theorem. For each principle, its...
80 CitationsSource
#1Søren Kejser Jensen (AAU: Aalborg University)H-Index: 2
#2Torben Bach Pedersen (AAU: Aalborg University)H-Index: 42
Last. Christian Thomsen (AAU: Aalborg University)H-Index: 16
view all 3 authors...
The collection of time series data increases as more monitoring and automation are being deployed. These deployments range in scale from an Internet of things (IoT) device located in a household to enormous distributed Cyber-Physical Systems (CPSs) producing large volumes of data at high velocity. To store and analyze these vast amounts of data, specialized Time Series Management Systems (TSMSs) have been developed to overcome the limitations of general purpose Database Management Systems (DBMSs...
67 CitationsSource
#1Daniel Seybold (University of Ulm)H-Index: 8
#2Christopher B. Hauser (University of Ulm)H-Index: 6
Last. Jörg Domaschka (University of Ulm)H-Index: 12
view all 4 authors...
Driven by new application domains, the database management systems (DBMSs) landscape has significantly evolved from single node DBMS to distributed database management systems (DDBMSs). In parallel, cloud computing became the preferred solution to run distributed applications. Hence, modern DDBMSs are designed to run in the cloud. Yet, in distributed systems the probability of failures is the higher the more entities are involved and by using cloud resources the probability of failures increases...
6 CitationsSource
#1Kun LanH-Index: 4
#2Simon FongH-Index: 34
Last. Richard MillhamH-Index: 9
view all 5 authors...
Over the years, advanced IT technologies have facilitated the emergence of new ways of generating and gathering data rapidly, continuously, and largely and are associated with a new research and application branch, namely, data stream mining (DSM). Among those multiple scenarios of DSM, the Internet of Things (IoT) plays a significant role, with a typical meaning of a tough and challenging computational case of big data. In this paper, we describe a self-adaptive approach to the pre-processing s...
7 CitationsSource
Sep 24, 2017 in ADBIS (Advances in Databases and Information Systems)
#1Daniel Seybold (University of Ulm)H-Index: 8
#2Jörg Domaschka (University of Ulm)H-Index: 12
The database landscape has significantly evolved over the last decade as cloud computing enables to run distributed databases on virtually unlimited cloud resources. Hence, the already non-trivial task of selecting and deploying a distributed database system becomes more challenging. Database evaluation frameworks aim at easing this task by guiding the database selection and deployment decision. The evaluation of databases has evolved as well by moving the evaluation focus from performance to di...
10 CitationsSource
#1Felix Gessert (UHH: University of Hamburg)H-Index: 8
#2Wolfram Wingerath (UHH: University of Hamburg)H-Index: 5
Last. Norbert Ritter (UHH: University of Hamburg)H-Index: 15
view all 4 authors...
Today, data is generated and consumed at unprecedented scale. This has lead to novel approaches for scalable data management subsumed under the term "NoSQL" database systems to handle the ever-increasing data volume and request loads. However, the heterogeneity and diversity of the numerous existing systems impede the well-informed selection of a data store appropriate for a given application context. Therefore, this article gives a top-down overview of the field: instead of contrasting the impl...
53 CitationsSource
Jun 13, 2017 in ADBIS (Advances in Databases and Information Systems)
#1Daniel Seybold (University of Ulm)H-Index: 8
#2Jörg Domaschka (University of Ulm)H-Index: 12
3 CitationsSource
Cited By23
#1Syed Iftikhar Hussain Shah (International Hellenic University)H-Index: 1
#2Vassilios Peristeras (International Hellenic University)
Last. Ioannis Magnisalis (International Hellenic University)H-Index: 2
view all 3 authors...
Performance differentiation and optimization are major dimensions and critical activities in cloud computing systems with shared execution infrastructures. Supporting these features from the perspective of cloud architecture, related concerns and requirements are important challenges, which need more in-depth research. In this regard, this work investigates the dark dimensions of the problem toward realizing an integrated architecture scheme. Therefore, the main goals of the research are to inve...
#1ShahSyed Iftikhar Hussain (International Hellenic University)
#1Syed Iftikhar Hussain Shah (International Hellenic University)H-Index: 1
Last. MagnisalisIoannis (International Hellenic University)
view all 3 authors...
The public sector, private firms, business community, and civil society are generating data that are high in volume, veracity, and velocity and come from a diversity of sources. This type of data i...
1 CitationsSource
#1Ali Shakarami (IAU: Islamic Azad University)H-Index: 4
#2Mostafa Ghobaei-Arani (IAU: Islamic Azad University)H-Index: 18
Last. Hamid Shakarami (IAU: Islamic Azad University)H-Index: 1
view all 5 authors...
In recent years, cloud storage systems have emerged as a promising technology for storing data blocks on various cloud servers. One of the main mechanisms in cloud storage systems is data replication, for which various solutions are proposed. Data replication's main target is achieving higher performance for data-intensive applications by addressing some critical challenges of this criterion, such as availability, reliability, security, bandwidth, and response time of data access. However, to th...
2 CitationsSource
#1Hamdi Kchaou (University of Sfax)H-Index: 3
#2Zied Kechaou (University of Sfax)H-Index: 8
Last. Adel M. Alimi (University of Sfax)H-Index: 4
view all 3 authors...
Abstract Scientific workflows stand as practical solutions useful for maintaining data intensive applications representation and execution purposes, which entail not only powerful computing resources, but also massive storage. With the emergence of cloud environment, which enhanced the execution of such applications, the study of workflow placement strategies, as targeted to effectively reduce data movements across data centers, has grown into a highly challenging objective. Given the fact that ...
1 CitationsSource
#1Neelamadhab Padhy (Gandhi Institute of Engineering and Technology)H-Index: 11
Abstract The rclone used to transfer the data in the Cloud. The key concept to execute our approach is to use the Perl script with ‘rclone.’ It allowed us to build an extensible, modular, and declaratively defined architecture for runtime uploading of bulk data at an optimized rate of transfer. Our key contributions include the solution itself, which enables the use of the API prototype and the ‘rlcone’ technique to improve the data transmission rate in an extensible and modular way. Our researc...
#1Ikrame Abroun (ENSA: Entertainments National Service Association)
#2Abdelilah Azyat (ENSA: Entertainments National Service Association)H-Index: 3
Last. Asaad Chahboun (ENSA: Entertainments National Service Association)H-Index: 2
view all 0 authors...
Human development is more than a question of the accumulation of wealth, income, or economic growth. It must be human-centred. This is why concerns as necessary as respect for human rights, the reduction of social inequalities and poverty, the promotion of equal opportunities between men and women are indeed relevant. This considers human resources not only as a means of growth but, more fundamentally, as an end of growth. The demographic variable was always a serious problem to decision-makers ...
#1Abudul Wahid Khan (University of Science and Technology)
#2Maseeh Ullah Khan (University of Science and Technology)
Last. Muhammad Fazal Ijaz (Sejong University)H-Index: 9
view all 8 authors...
#1Jörg Domaschka (University of Ulm)H-Index: 12
#2Simon Volpert (University of Ulm)H-Index: 3
Last. Daniel Seybold (University of Ulm)H-Index: 8
view all 3 authors...
The evolution of distributed Database Management Systems (DBMSs) has led to heterogeneity in DBMS technologies. Particularly DBMSs applying a shared-nothing approach enable distributed operation and support fine-grained configurations of distribution characteristics such as replication degree and consistency. Overall, the operation of such DBMSs on IaaS clouds leads to a large configuration space involving different cloud providers, cloud resources and pricing models.The selection of a specific ...