Forbidden knowledge in machine learning reflections on the limits of research and publication

  • PDF / 659,179 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 48 Downloads / 205 Views

DOWNLOAD

REPORT


OPEN FORUM

Forbidden knowledge in machine learning reflections on the limits of research and publication Thilo Hagendorff1 Received: 25 February 2020 / Accepted: 3 August 2020 © The Author(s) 2020

Abstract Certain research strands can yield “forbidden knowledge”. This term refers to knowledge that is considered too sensitive, dangerous or taboo to be produced or shared. Discourses about such publication restrictions are already entrenched in scientific fields like IT security, synthetic biology or nuclear physics research. This paper makes the case for transferring this discourse to machine learning research. Some machine learning applications can very easily be misused and unfold harmful consequences, for instance, with regard to generative video or text synthesis, personality analysis, behavior manipulation, software vulnerability detection and the like. Up till now, the machine learning research community embraces the idea of open access. However, this is opposed to precautionary efforts to prevent the malicious use of machine learning applications. Information about or from such applications may, if improperly disclosed, cause harm to people, organizations or whole societies. Hence, the goal of this work is to outline deliberations on how to deal with questions concerning the dissemination of such information. It proposes a tentative ethical framework for the machine learning community on how to deal with forbidden knowledge and dual-use applications. Keywords  Forbidden knowledge · Machine learning · Artificial intelligence · Governance · Dual-use · Publication norms

1 Introduction Currently, machine learning research is, not much different from other scientific fields, embracing the idea of open access. Research findings are publicly available and can be widely shared among other researchers and foster idea flow and collaboration. Among the description of research findings, scientists frequently share details about their own machine learning models or even the complete source code via platforms like GitHub, GitLab, SourceForge and others. However, the tenet of providing open access contradicts precautionary efforts to prevent the malicious use of machine learning applications. Some applications can very easily be used for harmful purposes, for instance, with regard to generative video or text synthesis, personality analysis or software vulnerability detection. Concrete examples for Cluster of Excellence “Machine Learning: New Perspectives for Science” * Thilo Hagendorff thilo.hagendorff@uni‑tuebingen.de 1



University of Tuebingen, Tübingen, Germany

high-stakes machine learning research are OpenAI’s GPT-2 text generator (Radford et al. 2019b), Michal Kosinski’s and Yilun Wang’s software for detecting people’s sexual orientation based on facial images (Kosinski and Wang 2018), developments in the field of synthetic media (Chesney and Citron 2018) and many more. Therefore, researchers have to answer the question whether “AI development [should] be fully open-sourced, or are there ethical reasons to limit

Data Loading...