An exploratory teaching program in big data analysis for undergraduate students

  • PDF / 4,604,973 Bytes
  • 20 Pages / 595.276 x 790.866 pts Page_size
  • 85 Downloads / 244 Views

DOWNLOAD

REPORT


ORIGINAL RESEARCH

An exploratory teaching program in big data analysis for undergraduate students Süleyman Eken1  Received: 25 January 2020 / Accepted: 30 July 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Many of the world’s biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing massive datasets. In this paper, exploratory teaching program is proposed. It provides a broad and practical introduction to big data analysis. This exploratory teaching program was designed and given in Department of Computer Engineering at Kocaeli University in the spring semester of 2018–2019. To assess the educational program’s impact on the learning process and to evaluate the acceptance and satisfaction level of students, they answered a questionnaire after finishing the program. According to students’ feedback, the exploratory teaching program is useful for learning how to analyze large datasets and identify patterns that will improve any company’s and organization decision-making process. Keywords  Big data analysis · Data science · Data visualization · Exploratory data analytics · Jupyter Notebook · Reproducible education program

1 Introduction

* Süleyman Eken [email protected]

not keep up with the rate at which data is generated leading to crushes and lower quality of the analysis. And that, in turn, creates a need for more scalable systems to store and process data. (iii) The sheer size of data in itself makes it prone to mistakes and data loss, for example, think of an average establishment trying keeping records of all its thousands of students across several categories. (iv) Data safety and integrity becomes a paramount worry to both authorities and institutions. The existing security protocols were not developed with big data in mind making them unfit for it, and the nature of the data and its continuous updating make it neither easy nor cheap to manage. With these difficulties in mind, we developed an exploratory big data analysis teaching program for undergraduates, aiming at familiarizing them with the key technologies employed to store, manipulate and analyze big data. We cover the basic tools for statistical analysis, the Python programming language, and various machine learning algorithms. The course focuses on introducing students to and making them proficient with the most important real time big data processing frameworks, namely Apache Hadoop1 and Apache Spark (Zaharia et al. 2016; Meng et al. 2016; Chintapalli et al. 2016). NoSQL (Not Only

1

1

In recent years, the rise in the use of social media and digitalization of social and economic activity was the source of unprecedented amounts of data, mostly in an unstructured form: weblogs, videos, speech recordings, photographs, e-mails, tweets (Aggarwal 2019; Oussous et al. 2018). This data can be analyzed to extract relevant and insightful information on business or society, and that is thanks to big data analysis tools