Data wrangling practices and collaborative interactions with aggregated data

  • PDF / 2,293,653 Bytes
  • 25 Pages / 439.37 x 666.142 pts Page_size
  • 85 Downloads / 176 Views

DOWNLOAD

REPORT


Data wrangling practices and collaborative interactions with aggregated data Shiyan Jiang 1

& Jennifer Kahn

2

Received: 18 February 2020 / Accepted: 19 August 2020 / Published online: 26 August 2020 # International Society of the Learning Sciences, Inc. 2020

Abstract

Data visualization technologies are powerful tools for telling evidence-based narratives about oneself and the world. This paper contributes to the literature on data science education by examining the sociotechnical practices of data wrangling—strategies for selecting and managing large, aggregated datasets to produce a model and story. We examined the learning opportunities related to data wrangling practices by investigating youth’s talk-in-interaction while assembling models and stories about family migration using interactive data visualization tools and large socioeconomic datasets. We first identified ten sociotechnical practices that characterize youth’s interaction with tools and collaboration in data wrangling. We then suggest four categories of activities to describe patterns of learning related to the practices, including addressing missing data, understanding data aggregation, exploring social or historical events that constitute the formation of data patterns, and varying data visual encoding for storytelling. These practices and activities are important to understand for supporting future data science education opportunities that facilitate learning and discussion about scientific and socioeconomic issues. This study also sheds light on how the family migration modeling context positions the youth as having agency and authority over the data and contributes to the design of CSCL environments that tackle the challenges of data wrangling. Keywords Data wrangling . Modeling . Storytelling . Family migration . Data visualization . Sociotechnical practices

* Shiyan Jiang [email protected]

1

Department of Teacher Education and Learning Sciences, North Carolina State University, Poe Hall, 208, 2310 Stinson Dr, Raleigh, NC 27695, USA

2

University of Miami, Coral Gables, FL, USA

258

Jiang S., Kahn J.

Introduction As the interdisciplinary field of data science has grown, large-scale datasets and interactive data visualization tools have become increasingly open and accessible, creating opportunities for learning. Likewise, data science education is a growing new area of CSCL, in which youth interact with others, digital technologies, and complex datasets to support inquiry in multiple disciplines, such as science, mathematics, and social studies (Lee and Wilkerson 2018). Data science tools provide novel ways of exploring traditional disciplinary content and developing statistical and data literacy across formal and informal settings. While education research has focused on facilitating the learning of statistical reasoning (e.g., Aridor and Ben-Zvi 2018; Hancock et al. 1992; Konold et al. 2015; Moore 1990), inference (e.g., Makar et al. 2011; Makar and Rubin 2018), and modeling (Lehrer and English 2018), more research is needed to unde