3. Results

3.1. Bibliometric Analysis

The objective of this analysis is to answer RQ1. To answer the first part of the rationale of this question, we analyze the results of the inclusion and selection stages. In Figure 4, we summarize the results of the inclusion stage and highlight some findings:

Figure 4. Number of primary studies by year and source

Figure 4. Number of primary studies by year and source.

  • The average annual growth rate of published articles follows Equation (1)

\(y=30.309x−29.2\)

  • Prior to 2010, no relevant studies about Big Data modeling are published
  • Since 2015, the number of studies has increased significantly and, in 2018, there were 318 published articles. In 2019, there were already 106 publications before August
  • Scopus ranked the highest of all considered sources, with 760 collected works, followed by WoS with 321 works, IEEE Xplore with 200 and ScienceDirect with 95

The results of the selection stage are presented in Figure 5, organized by source and year. We can highlight the following findings:

Figure 5. Results of selection stage.

Figure 5. Results of selection stage.

  1. Prior to 2013, no relevant studies were found;
  2. The year in which we found the most quantity of studies about Big Data modeling is 2018. However, it is important to highlight that 2019 is ongoing and could ultimately have more studies than 2018;
  3. With 27 papers, Scopus is the source holding the highest number of relevant studies, followed by WoS and IEEE Xplore with two papers each. ScienceDirect does not report any relevant paper about the topic.

Table 3 summarizes the main data of each of the selected articles, among them, the reference, the first author name and affiliation, the country where the research was done, the identification of the journal or conference, the digital library, the publication year, the number of citations in Scopus, the knowledge application and the existence of funding.

Table 3. Bibliometric Analysis.

First Author's Name First Author's Affiliation Country Journal/Conference ID Digital Library Publication Year Citations in Scopus Knowledge Application Funding
Jie Song Software College, Northeastern China J1 Scopus 2019 0 Academy Yes
Laurent Thiry University of Haute Alsace France J2 Scopus 2018 0 Academy NA
Victor Martins de Sousa UNIFACCAMP Brazil C1 Scopus 2018 1 Academy Yes
Igor Zečević University of Novi Sad Serbia J3 Scopus 2018 2 Academy Yes
Antonio M. Rinaldi University of Naples Federico II Italy C2 Scopus 2018 1 Academy NA
Shady Hamouda Emirates College of Technology United Arab Emirates C3 Scopus 2018 1 Academy NA
Dippy Aggarwal University of Cincinnati United States of America J4 Scopus 2018 0 Academy NA
Alfonso de la Vega University of Cantabria Spain C4 Scopus 2018 0 Academy Yes
Xu Chen North Minzu University China J5 Scopus 2018 0 Academy NA
Maribel Yasmina Santos University of Minho Portugal J6 Scopus 2017 10 Academy Yes
KwangchuShin Kook Min University South Korea J7 Scopus 2017 7 Academy Yes
Fatma Abdelhedi Toulouse Capitole University France C5 Scopus 2017 1 Academy NA
Aravind Mohan Wayne State University United States of America C6 Scopus 2016 7 Academy Yes
Massimo Villari University of Messina Italy C7 Scopus 2016 2 Academy NA
Maribel Yasmina Santos University of Minho Portugal C8 Scopus 2016 10 Academy Yes
Maribel Yasmina Santos University of Minho Portugal J8 Scopus 2016 8 Academy Yes
Ganesh B. Solanke PCCoE, Nigdi India C9 Scopus 2018 0 Academy NA
Vincent Reniers KU Leuven Belgium C10 Scopus 2018 0 Academy Yes
Fatma Abdelhedi Toulouse Capitole University France C11 Scopus 2016 4 Academy NA
Max Chevalier University of Toulouse France C12 Scopus 2016 6 Academy ANRT
Shreya Banerjee National Institute of Technology India C13 Scopus 2015 6 Academy NA
Artem Chebotko DataStax Inc. United States of America C14 Scopus 2015 43 Industry Yes
Wenduo Feng Guangxi University China C15 Scopus 2015 2 Academy Yes
Ling Chen Zhejiang University China C16 Scopus 2015 1 Academy Yes
Max Chevalier University of Toulouse France C17 Scopus 2015 14 Academy NA
Dewi W. Wardani Sebelas Maret University Indonesia C18 Scopus 2014 7 Academy NA
Ming Zhe Hubei University of Technology China C19 Scopus 2013 0 Academy NA
Mohamed Nadjib Mami University of Bonn Germany C20 Scopus 2016 5 Academy Yes
Dan Han University of Alberta Canada C21 WoS 2013 0 Academy Yes
Zhiyun Zheng Zhengzhou University China C22 IEEE 2014 1 Academy Yes
Dongqi Wei University of Geosciences China C23 IEEE 2014 3 Academy NA
Karamjit Kaur Thapar University India C24 IEEE 2013 59 Academy NA
Michael J. Mior University of Waterloo Canada J1 IEEE 2017 12 Academy Yes
Max Chevalier University of Toulouse France C25 Scopus 2016 4 Academy Yes
Harley Vera University of Brasília Brazil C26 Scopus 2015 8 Academy NA
Robert T. Mason Regis University United States of America C27 NA 2015 0 Academy NA

3.1.1. Authors

In Figure 6, it is possible to verify the names of the first authors who have made major contributions to the subject. Thus, Maribel Yasmina Santos from Portugal and Max Chevalier from France occupy first place with three articles each and Fatma Abdelhedi from France is in second place with two articles. According to the observed data, two of Santos' studies were published in 2016 and another in 2017. Their research was performed in collaboration with University of Minho and it is the only one from Portugal presented in the final corpus of studies.

Figure 6. Contribution by author.

Figure 6. Contribution by author.

Regarding France, the authors Chevalier, Abdelhedi and Laurent Thiry investigated the topic and added six contributions in total, one published in 2015, 3 in 2016 and one in 2017 and 2018. Their research is linked to University of Haute Alsace and to University of Toulouse.

Countries such as the USA, China and India have made several contributions from different authors. For the USA there are four articles, in 2015 Robert Mason of the Regis University and Artem Chebotko from DataStax Inc. presented an article each, in 2016 Aravind Mohan from Wayne State University and in 2018, Dippy Aggarwal of the University of Cincinnati, also presented their approaches. In China, the six authors and institutions that have made contributions were, in 2013 Ming Zhe Hubei of the University of Technology, in 2014 Dongqi Wei from the University of Geosciences, in 2015 Ling Chen from Zhejiang University and Wenduo Feng of the Guangxi University, on 2018 Xu Chen University of the North Minzu and in 2019, Jie Song from the Software College, Northeastern. There are three studies from India published by Karamjit Kaur from Thapar University in 2013, Shreya Banerjee of the National Institute of Technology in 2015 and Ganesh Soanke from PCCoE, Nigdi in 2018.

In total, 15 different institutions, one from the industry and 14 from the academy have presented relevant works and it can be observed that even in 2018 and 2019 the subject is still being actively investigated.


3.1.2. Countries and Years

In this part, after discarding 16 duplicated results, 101 of the 117 articles collected after the inclusion stage were taken as sample. These articles contain research pertaining to our area of interest and allow us to analyze a greater number of articles.

Figure 7 shows that the leading countries in the topic of interest are the USA and China, with 17 and 16 articles, respectively. The country where Chevalier performed his research, France, takes third place with eight articles and, with seven publications each, Italy and Spain take the fourth spot. Finally, Germany takes fifth place with six studies.

Figure 7. Contribution by year and country.

Figure 7. Contribution by year and country.

For the USA, four articles were published in 2014, one article in 2015, six articles in 2016, two articles in 2017, three articles in 2018 and one article in 2019. Therefore, the research in that country started in 2014, had the most contributions in 2016 and continues through 2019. Regarding China, their first article was published in 2013, followed by three articles in 2014, two articles in 2015, during 2016, 2017 and 2018 three articles in every year and one article in 2019. It can be seen that China initiated its research in 2013 and still continues to investigate the topic. It is also worth mentioning the constant article publications observed between 2014 and 2018. France started in 2015 with one article, four articles in 2016, one article in 2017 and two articles in 2018. This country started the research in 2015 and 2016 was the year with more contributions. Italy and Spain also started the research in 2015. Italy presented more articles during 2016 and Spain in 2018. Regarding 2019, only Spain has published one article. In 2013, Germany started the research with one article and its last published article was found in 2018.

As conclusion, from 2015 onwards, more countries start contributing to the scientific production on this topic, doubling the number of published articles in 2016. In 2018 and 2019, the trend remains. However, the year 2019 is still ongoing; therefore, it is likely that many studies will be published before the end of the year.


3.1.3. Citations

Table 3 presents the number of citations of the studies in Scopus. Figure 8 presents the article with the greatest impact, which has 59 citations and was published by Karamjit Kaur from India, followed by one by Artem Chebotko from the USA, with 43 citations. It is important to highlight that both authors also belong to the countries with more contributions.

Figure 8. Number of citations in Scopus.

Figure 8. Number of citations in Scopus.

The most cited article has "Modeling and querying data in NoSQL databases" as a title and was published in 2013. The second most cited article is titled "A Big Data Modeling Methodology for Apache Cassandra" and was published in 2015. Further details about these publications are presented in the SLR section.

It can also be noted, according to Table 3, that 97.22% of the articles belong to the academy and that 52.78% of the articles were funded. According to our criteria, this topic is considered of high relevance because funds are allocated in projects for research.

Table 4 and Table 5 provide information to the reader about the journals and conferences where the studies are published; their impact factor is also presented in the JCR and SJR, and, for the conferences, their ranking. It is important to highlight that 75% of the studies were presented in conferences, thus we can anticipate that for the current year there are studies still under progress, that have not reached their final stage.

Table 4. Information of journals where the relevant studies were presented.

Journal ID Journal Name Country JCR IF SJR Study ID
J1 IEEE Transaction on Knowledge and Data Engineering United States of America 3.86 1.1 1, 33
J2 Journal of Big Data United Kingdom NA 1.1 2
J3 Enterprise Information Systems United Kingdom 2.12 0.7 4
J4 Advances in Intelligent Systems and Computing Germany NA 0.2 7
J5 Filomat Serbia 0.79 0.4 9
J6 Journal of Management Analytics United Kingdom NA NA 10
J7 International Journal of Applied Engineering Research India NA 0.1 11
J8 Lecture Notes in Computer Science Germany 0.4 0.3 16

Table 5. Information of conferences where the relevant studies were presented.

Conference ID Conference Name CORE 2018 Ranking Study ID
C1 20th International Conference on Information Integration and Web-Based Applications and Services C 3
C2 10th International Conference on Management of Digital EcoSystems Not ranked 5
C3 2017 International Conference on Big Data Innovations and Applications Not ranked 6
C4 8th International Conference on Model and Data Engineering Not ranked 8
C5 19th International Conference on Enterprise Information Systems C 12
C6 5th IEEE International Congress on Big Data Not ranked 13
C7 2016 IEEE Symposium on Computers and Communication B 14
C8 9th International C* Conference on Computer Science and Software Engineering Not ranked 15
C9 2017 International Conference on Computing, Communication, Control and Automation Not ranked 17
C10 2017 IEEE International Conference on Big Data Not ranked 18
C11 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management C 19
C12 18th International Conference on Enterprise Information Systems C 20
C13 2015 IEEE International Conference on Industrial Informatics Not ranked 21
C14 4th IEEE International Congress on Big Data Not ranked 22
C15 2015 IEEE International Conference on Smart City/SocialCom/SustainCom together with DataCom Not ranked 23
C16 2015 IEEE International Conference on Multimedia Big Data Not ranked 24
C17 17th International Conference on Big Data Analytics and Knowledge Discovery Not ranked 25
C18 2014 International Conference on Computer, Control, Informatics and Its Applications Not ranked 26
C19 2013 International Conference on Computer Sciences and Applications Not ranked 27
C20 18th International Conference on Big Data Analytics and Knowledge Discovery Not ranked 28
C21 2013 IEEE Sixth International Conference on Cloud Computing B 29
C22 3rd IEEE International Congress on Big Data Not ranked 30
C23 2014 Fifth International Conference on Computing for Geospatial Research and Application Not ranked 31
C24 2013 IEEE International Conference on Big Data Not ranked 32
C25 IEEE Tenth International Conference on Research Challenges in Information Science B 34
C26 2nd Annual International Symposium on Information Management and Big Data B 35
C27 Informing Science & IT Education Conference C 36

3.1.4. Journals

We present in Table 4 the list of journals where the selected relevant studies were published. The table contains the assigned journal identifier, the journal name, the journal's country, the impact factor (IF) in the JCR and SJR and the related study ID. We considered it important to display the JCR IF and the SJR, since they are indicators related to the quality of the research according to the number of citations of the published studies and their importance in the scientific research.


3.1.5. Conferences

We present in Table 5 the details of the conferences where some relevant studies were presented. The assigned conference identifier, the conference name, the core ranking and the respective studies identifiers are listed. We used the conference ranking Computing Research and Education Association of Australasia (CORE), 2018. This ranking was created by an association of computer science departments from universities in Australia and New Zealand. This Association provides conference rankings in the computing disciplines based on a mix of indicators, including citation rates, paper submission and acceptance rates. The rankings range are represented by the letters A*, A, B and C - A* being the best and C the worst.
Through the performed analysis, research question RQ1 is answered in significant detail. In order to answer the next two research questions, each of the selected articles deemed as relevant were analyzed, after a full reading of each of them.