Preference-based interactive multi-document summarisation

  • PDF / 3,845,964 Bytes
  • 31 Pages / 439.37 x 666.142 pts Page_size
  • 73 Downloads / 126 Views

DOWNLOAD

REPORT


Preference‑based interactive multi‑document summarisation Yang Gao1,2   · Christian M. Meyer1 · Iryna Gurevych1 Received: 28 February 2019 / Accepted: 28 October 2019 © The Author(s) 2019

Abstract Interactive NLP is a promising paradigm to close the gap between automatic NLP systems and the human upper bound. Preference-based interactive learning has been successfully applied, but the existing methods require several thousand interaction rounds even in simulations with perfect user feedback. In this paper, we study preference-based interactive summarisation. To reduce the number of interaction rounds, we propose the Active Preference-based ReInforcement Learning (APRIL) framework. APRIL uses active learning to query the user, preference learning to learn a summary ranking function from the preferences, and neural Reinforcement learning to efficiently search for the (near-)optimal summary. Our results show that users can easily provide reliable preferences over summaries and that APRIL outperforms the state-of-the-art preference-based interactive method in both simulation and real-user experiments. Keywords  Interactive Natural Language Processing · Document summarisation · Reinforcement learning · Active learning · Preference learning

1 Introduction Interactive Natural Language Processing (NLP) approaches that put the human in the loop gained increasing research interests recently (Amershi et  al. 2014; Gurevych et  al. 2018; Kreutzer et al. 2018a). The user-system interaction enables personalised and user-adapted results by incrementally refining the underlying model based on a user’s behaviour and by optimising the learning through actively querying for feedback and judgements. Interactive methods can start with no or only few input data and adjust the output to the needs of human users. Previous research has explored eliciting different forms of feedback from users in interactive NLP, for example mouse clicks for information retrieval (Borisov et al. 2018), postedits and ratings for machine translation (Denkowski et  al. 2014; Kreutzer et  al. 2018a), * Yang Gao [email protected] https://www.ukp.tu-darmstadt.de 1

Ubiquitous Knowledge Processing Lab (UKP‑TUDA), Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany

2

Royal Holloway University of London, Egham, UK



13

Vol.:(0123456789)



Information Retrieval Journal

error markings for semantic parsing (Lawrence and Riezler 2018), bigrams for summarisation (Avinesh and Meyer 2017), and preferences for translation (Kreutzer et al. 2018b). Controlled experiments suggest that asking for preferences places a lower cognitive burden on the human subjects than asking for absolute ratings or categorised labels (Thurstone 1927; Kendall 1948; Kingsley and Brown 2010). But it remains unclear whether people can easily provide reliable preferences over summaries. In addition, preference-based interactive NLP faces the high sample complexity problem: a preference is a binary decision and hence only contains a single bit of information,