Learnings from developing an applied data science curricula for undergraduate and graduate students

  • PDF / 1,236,933 Bytes
  • 7 Pages / 432 x 648 pts Page_size
  • 18 Downloads / 207 Views

DOWNLOAD

REPORT


MRS Advances © 2020 Materials Research Society DOI: 10.1557/adv.2020.135

Learnings from developing an applied data science curricula for undergraduate and graduate students Roger H. French

1,2,3,4

and Laura S. Bruckman 1,2

1

SDLE Research Center, Case Western Reserve University, Cleveland OH, 44106

2

Dept. of Materials Science & Engineering, Case Western Reserve University, Cleveland OH, 44106

3 Dept. of Macromolecular Science & Engineering Case Western Reserve University, Cleveland OH, 44106

4

Dept. of Computer & Data Sciences, Case Western Reserve University, Cleveland OH 44106

ABSTRACT

Data science has advanced significantly in recent years and allows scientists to harness large-scale data analysis techniques using open source coding frameworks. Data science is a tool that should be taught to science and engineering students in addition to their chosen domain knowledge. An applied data science minor allows students to understand data and data handling as well as statistics and model development. This move will improve reproducibility and openness of research as well as allow for greater interdisciplinarity and more analyses focusing on critical scientific challenges.

TRANSFORMATIVE CHANGES DRIVING DATA SCIENCE, BIG DATA ANALYTICS AND DEEP LEARNING Digital Transformation and Big Data Analytics Data science has arisen from combined advances in computing, communication, and data that are driving the digital transformation [1-3]. Digital transformation has benefited from the computer science concepts developed by organizations such as Google, which have enabled big data analytics and led to the

Downloaded from https://www.cambridge.org/core. Access paid by the UCSB Libraries, on 09 Apr 2020 at 06:06:02, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1557/adv.2020.135

development of the open source projects such as Hadoop, Hbase, and most recently Spark. These projects allow for data-driven modeling of massive petabyte scale datasets [4-7]. These “distributed computing” approaches and the ease of acquiring petabyte scale datasets have given rise to Facebook and other data-centric technologies and enable the digital transformation across industry, science and technology, and society itself. Distributed computing complements the petaflop computing characteristic of high performance computing allowing for the rise of a new computing paradigm of distributed and high performance computing (D&HPC). Openness Another major driver of change has been the move towards “openness” [8] in which restrictive copyrights and licenses limit innovation and creativity [9]. Several key events have shown that the open source approach drives collaboration and community development, and accelerates innovation [10,11]. These events include:  the establishment of the Free Software Foundation in the 1980’s;  the initial release of the Linux kernel by Linus Torvalds in 1991 as an open source version of Unix;  the release of R an open source version of the S lan