The main real-world datasets used in the studies analyzed for this paper were sensor data, image metadata, website publications, and electronic documents. Most of the studies analyzed did not document the specific languages they used to model their data or the tool they used. But due to the need to analyze large volumes of data with various structures, which arrive in high frequency, database research became more focused on NoSQL than relational databases. Why might a NoSQL vs. Relational approach be best for database management, according to growing trends captured in this review of research?
3. Results
3.1. Bibliometric Analysis
The objective of this analysis is to answer RQ1. To answer the first part of the rationale of this question, we analyze the results of the inclusion and selection stages. In Figure 4, we summarize the results of the inclusion stage and highlight some findings:
Figure 4. Number of primary studies by year and source.
- The average annual growth rate of published articles follows Equation (1)
\(y=30.309x−29.2\)
- Prior to 2010, no relevant studies about Big Data modeling are published
- Since 2015, the number of studies has increased significantly and, in 2018, there were 318 published articles. In 2019, there were already 106 publications before August
- Scopus ranked the highest of all considered sources, with 760 collected works, followed by WoS with 321 works, IEEE Xplore with 200 and ScienceDirect with 95
The results of the selection stage are presented in Figure 5, organized by source and year. We can highlight the following findings:
Figure 5. Results of selection stage.
- Prior to 2013, no relevant studies were found;
- The year in which we found the most quantity of studies about Big Data modeling is 2018. However, it is important to highlight that 2019 is ongoing and could ultimately have more studies than 2018;
- With 27 papers, Scopus is the source holding the highest number of relevant studies, followed by WoS and IEEE Xplore with two papers each. ScienceDirect does not report any relevant paper about the topic.
Table 3 summarizes the main data of each of the selected articles, among them, the reference, the first author name and affiliation, the country where the research was done, the identification of the journal or conference, the digital library, the publication year, the number of citations in Scopus, the knowledge application and the existence of funding.
Table 3. Bibliometric Analysis.
First Author's Name | First Author's Affiliation | Country | Journal/Conference ID | Digital Library | Publication Year | Citations in Scopus | Knowledge Application | Funding |
---|---|---|---|---|---|---|---|---|
Jie Song | Software College, Northeastern | China | J1 | Scopus | 2019 | 0 | Academy | Yes |
Laurent Thiry | University of Haute Alsace | France | J2 | Scopus | 2018 | 0 | Academy | NA |
Victor Martins de Sousa | UNIFACCAMP | Brazil | C1 | Scopus | 2018 | 1 | Academy | Yes |
Igor Zečević | University of Novi Sad | Serbia | J3 | Scopus | 2018 | 2 | Academy | Yes |
Antonio M. Rinaldi | University of Naples Federico II | Italy | C2 | Scopus | 2018 | 1 | Academy | NA |
Shady Hamouda | Emirates College of Technology | United Arab Emirates | C3 | Scopus | 2018 | 1 | Academy | NA |
Dippy Aggarwal | University of Cincinnati | United States of America | J4 | Scopus | 2018 | 0 | Academy | NA |
Alfonso de la Vega | University of Cantabria | Spain | C4 | Scopus | 2018 | 0 | Academy | Yes |
Xu Chen | North Minzu University | China | J5 | Scopus | 2018 | 0 | Academy | NA |
Maribel Yasmina Santos | University of Minho | Portugal | J6 | Scopus | 2017 | 10 | Academy | Yes |
KwangchuShin | Kook Min University | South Korea | J7 | Scopus | 2017 | 7 | Academy | Yes |
Fatma Abdelhedi | Toulouse Capitole University | France | C5 | Scopus | 2017 | 1 | Academy | NA |
Aravind Mohan | Wayne State University | United States of America | C6 | Scopus | 2016 | 7 | Academy | Yes |
Massimo Villari | University of Messina | Italy | C7 | Scopus | 2016 | 2 | Academy | NA |
Maribel Yasmina Santos | University of Minho | Portugal | C8 | Scopus | 2016 | 10 | Academy | Yes |
Maribel Yasmina Santos | University of Minho | Portugal | J8 | Scopus | 2016 | 8 | Academy | Yes |
Ganesh B. Solanke | PCCoE, Nigdi | India | C9 | Scopus | 2018 | 0 | Academy | NA |
Vincent Reniers | KU Leuven | Belgium | C10 | Scopus | 2018 | 0 | Academy | Yes |
Fatma Abdelhedi | Toulouse Capitole University | France | C11 | Scopus | 2016 | 4 | Academy | NA |
Max Chevalier | University of Toulouse | France | C12 | Scopus | 2016 | 6 | Academy | ANRT |
Shreya Banerjee | National Institute of Technology | India | C13 | Scopus | 2015 | 6 | Academy | NA |
Artem Chebotko | DataStax Inc. | United States of America | C14 | Scopus | 2015 | 43 | Industry | Yes |
Wenduo Feng | Guangxi University | China | C15 | Scopus | 2015 | 2 | Academy | Yes |
Ling Chen | Zhejiang University | China | C16 | Scopus | 2015 | 1 | Academy | Yes |
Max Chevalier | University of Toulouse | France | C17 | Scopus | 2015 | 14 | Academy | NA |
Dewi W. Wardani | Sebelas Maret University | Indonesia | C18 | Scopus | 2014 | 7 | Academy | NA |
Ming Zhe | Hubei University of Technology | China | C19 | Scopus | 2013 | 0 | Academy | NA |
Mohamed Nadjib Mami | University of Bonn | Germany | C20 | Scopus | 2016 | 5 | Academy | Yes |
Dan Han | University of Alberta | Canada | C21 | WoS | 2013 | 0 | Academy | Yes |
Zhiyun Zheng | Zhengzhou University | China | C22 | IEEE | 2014 | 1 | Academy | Yes |
Dongqi Wei | University of Geosciences | China | C23 | IEEE | 2014 | 3 | Academy | NA |
Karamjit Kaur | Thapar University | India | C24 | IEEE | 2013 | 59 | Academy | NA |
Michael J. Mior | University of Waterloo | Canada | J1 | IEEE | 2017 | 12 | Academy | Yes |
Max Chevalier | University of Toulouse | France | C25 | Scopus | 2016 | 4 | Academy | Yes |
Harley Vera | University of Brasília | Brazil | C26 | Scopus | 2015 | 8 | Academy | NA |
Robert T. Mason | Regis University | United States of America | C27 | NA | 2015 | 0 | Academy | NA |
3.1.1. Authors
In Figure 6, it is possible to verify the names of the first authors who have made major contributions to the subject. Thus, Maribel Yasmina Santos from Portugal and Max Chevalier from France occupy first place with three articles each and Fatma Abdelhedi from France is in second place with two articles. According to the observed data, two of Santos' studies were published in 2016 and another in 2017. Their research was performed in collaboration with University of Minho and it is the only one from Portugal presented in the final corpus of studies.
Figure 6. Contribution by author.
Regarding France, the authors Chevalier, Abdelhedi and Laurent Thiry investigated the topic and added six contributions in total, one published in 2015, 3 in 2016 and one in 2017 and 2018. Their research is linked to University of Haute Alsace and to University of Toulouse.
Countries such as the USA, China and India have made several contributions from different authors. For the USA there are four articles, in 2015 Robert Mason of the Regis University and Artem Chebotko from DataStax Inc. presented an article each, in 2016 Aravind Mohan from Wayne State University and in 2018, Dippy Aggarwal of the University of Cincinnati, also presented their approaches. In China, the six authors and institutions that have made contributions were, in 2013 Ming Zhe Hubei of the University of Technology, in 2014 Dongqi Wei from the University of Geosciences, in 2015 Ling Chen from Zhejiang University and Wenduo Feng of the Guangxi University, on 2018 Xu Chen University of the North Minzu and in 2019, Jie Song from the Software College, Northeastern. There are three studies from India published by Karamjit Kaur from Thapar University in 2013, Shreya Banerjee of the National Institute of Technology in 2015 and Ganesh Soanke from PCCoE, Nigdi in 2018.
In total, 15 different institutions, one from the industry and 14 from the academy have presented relevant works and it can be observed that even in 2018 and 2019 the subject is still being actively investigated.
3.1.2. Countries and Years
In this part, after discarding 16 duplicated results, 101 of the 117 articles collected after the inclusion stage were taken as sample. These articles contain research pertaining to our area of interest and allow us to analyze a greater number of articles.
Figure 7 shows that the leading countries in the topic of interest are the USA and China, with 17 and 16 articles, respectively. The country where Chevalier performed his research, France, takes third place with eight articles and, with seven publications each, Italy and Spain take the fourth spot. Finally, Germany takes fifth place with six studies.
Figure 7. Contribution by year and country.
For the USA, four articles were published in 2014, one article in 2015, six articles in 2016, two articles in 2017, three articles in 2018 and one article in 2019. Therefore, the research in that country started in 2014, had the most contributions in 2016 and continues through 2019. Regarding China, their first article was published in 2013, followed by three articles in 2014, two articles in 2015, during 2016, 2017 and 2018 three articles in every year and one article in 2019. It can be seen that China initiated its research in 2013 and still continues to investigate the topic. It is also worth mentioning the constant article publications observed between 2014 and 2018. France started in 2015 with one article, four articles in 2016, one article in 2017 and two articles in 2018. This country started the research in 2015 and 2016 was the year with more contributions. Italy and Spain also started the research in 2015. Italy presented more articles during 2016 and Spain in 2018. Regarding 2019, only Spain has published one article. In 2013, Germany started the research with one article and its last published article was found in 2018.
As conclusion, from 2015 onwards, more countries start contributing to the scientific production on this topic, doubling the number of published articles in 2016. In 2018 and 2019, the trend remains. However, the year 2019 is still ongoing; therefore,
it is likely that many studies will be published before the end of the year.
3.1.3. Citations
Table 3 presents the number of citations of the studies in Scopus. Figure 8 presents the article with the greatest impact, which has 59 citations and was published by Karamjit Kaur from India, followed by one by Artem Chebotko from the USA, with 43 citations. It is important to highlight that both authors also belong to the countries with more contributions.
Figure 8. Number of citations in Scopus.
The most cited article has "Modeling and querying data in NoSQL databases" as a title and was published in 2013. The second most cited article is titled "A Big Data Modeling Methodology for Apache Cassandra" and was published in 2015. Further details
about these publications are presented in the SLR section.
It can also be noted, according to Table 3, that 97.22% of the articles belong to the academy and that 52.78% of the articles were funded. According to our criteria, this topic is considered of high relevance because funds are allocated in projects for research.
Table 4 and Table 5 provide information to the reader about the journals and conferences where the studies are published; their impact factor is also presented in the JCR and SJR, and, for the conferences, their ranking. It is important to highlight
that 75% of the studies were presented in conferences, thus we can anticipate that for the current year there are studies still under progress, that have not reached their final stage.
Table 4. Information of journals where the relevant studies were presented.
Journal ID | Journal Name | Country | JCR IF | SJR | Study ID |
---|---|---|---|---|---|
J1 | IEEE Transaction on Knowledge and Data Engineering | United States of America | 3.86 | 1.1 | 1, 33 |
J2 | Journal of Big Data | United Kingdom | NA | 1.1 | 2 |
J3 | Enterprise Information Systems | United Kingdom | 2.12 | 0.7 | 4 |
J4 | Advances in Intelligent Systems and Computing | Germany | NA | 0.2 | 7 |
J5 | Filomat | Serbia | 0.79 | 0.4 | 9 |
J6 | Journal of Management Analytics | United Kingdom | NA | NA | 10 |
J7 | International Journal of Applied Engineering Research | India | NA | 0.1 | 11 |
J8 | Lecture Notes in Computer Science | Germany | 0.4 | 0.3 | 16 |
Table 5. Information of conferences where the relevant studies were presented.
Conference ID | Conference Name | CORE 2018 Ranking | Study ID |
---|---|---|---|
C1 | 20th International Conference on Information Integration and Web-Based Applications and Services | C | 3 |
C2 | 10th International Conference on Management of Digital EcoSystems | Not ranked | 5 |
C3 | 2017 International Conference on Big Data Innovations and Applications | Not ranked | 6 |
C4 | 8th International Conference on Model and Data Engineering | Not ranked | 8 |
C5 | 19th International Conference on Enterprise Information Systems | C | 12 |
C6 | 5th IEEE International Congress on Big Data | Not ranked | 13 |
C7 | 2016 IEEE Symposium on Computers and Communication | B | 14 |
C8 | 9th International C* Conference on Computer Science and Software Engineering | Not ranked | 15 |
C9 | 2017 International Conference on Computing, Communication, Control and Automation | Not ranked | 17 |
C10 | 2017 IEEE International Conference on Big Data | Not ranked | 18 |
C11 | 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management | C | 19 |
C12 | 18th International Conference on Enterprise Information Systems | C | 20 |
C13 | 2015 IEEE International Conference on Industrial Informatics | Not ranked | 21 |
C14 | 4th IEEE International Congress on Big Data | Not ranked | 22 |
C15 | 2015 IEEE International Conference on Smart City/SocialCom/SustainCom together with DataCom | Not ranked | 23 |
C16 | 2015 IEEE International Conference on Multimedia Big Data | Not ranked | 24 |
C17 | 17th International Conference on Big Data Analytics and Knowledge Discovery | Not ranked | 25 |
C18 | 2014 International Conference on Computer, Control, Informatics and Its Applications | Not ranked | 26 |
C19 | 2013 International Conference on Computer Sciences and Applications | Not ranked | 27 |
C20 | 18th International Conference on Big Data Analytics and Knowledge Discovery | Not ranked | 28 |
C21 | 2013 IEEE Sixth International Conference on Cloud Computing | B | 29 |
C22 | 3rd IEEE International Congress on Big Data | Not ranked | 30 |
C23 | 2014 Fifth International Conference on Computing for Geospatial Research and Application | Not ranked | 31 |
C24 | 2013 IEEE International Conference on Big Data | Not ranked | 32 |
C25 | IEEE Tenth International Conference on Research Challenges in Information Science | B | 34 |
C26 | 2nd Annual International Symposium on Information Management and Big Data | B | 35 |
C27 | Informing Science & IT Education Conference | C | 36 |
3.1.4. Journals
We present in Table 4 the list of journals where the selected relevant studies were published. The table contains the assigned journal identifier, the journal name, the journal's country, the impact factor (IF) in the JCR and SJR and the related study ID. We considered it important to display the JCR IF and the SJR, since they are indicators related to the quality of the research according to the number of citations of the published studies and their importance in the scientific research.
3.1.5. Conferences
We present in Table 5 the details of the conferences where some relevant studies were presented. The assigned conference identifier, the conference name, the core ranking and the respective studies identifiers are listed. We used the conference
ranking Computing Research and Education Association of Australasia (CORE), 2018. This ranking was created by an association of computer science departments from universities in Australia and New Zealand. This Association provides conference
rankings in the computing disciplines based on a mix of indicators, including citation rates, paper submission and acceptance rates. The rankings range are represented by the letters A*, A, B and C - A* being the best and C the worst.
Through
the performed analysis, research question RQ1 is answered in significant detail. In order to answer the next two research questions, each of the selected articles deemed as relevant were analyzed, after a full reading of each of them.