Robust dialog state tracker with contextual-feature augmentation
- PDF / 2,915,826 Bytes
- 16 Pages / 595.224 x 790.955 pts Page_size
- 66 Downloads / 222 Views
Robust dialog state tracker with contextual-feature augmentation Xuejun Zhang1
· Xuemin Zhao2 · Tian Tan3
Accepted: 28 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract Dialog state tracking (DST), which estimates dialog states given a dialog context, is a core component in task-oriented dialog systems. Existing data-driven methods usually extract features automatically through deep learning. However, most of these models have limitations. First, compared with hand-crafted delexicalization features, such features in deep learning approaches are not universal. However, they are important for tracking unseen slot values. Second, such models do not work well in situations where noisy labels are ubiquitous in datasets. To address these challenges, we propose a robust dialog state tracker with contextual-feature augmentation. Contextual-feature augmentation is used to extract generalized features; hence, it is capable of solving the unseen slot value tracking problem. We apply a simple but effective deep learning paradigm to train our DST model with noisy labels. The experimental results show that our model achieves state-of-the-art scores in terms of joint accuracy on the MultiWOZ 2.0 dataset. In addition, we show its performance in tracking unseen slot values by simulating unseen domain dialog state tracking. Keywords Human-machine interaction · Task-oriented dialog systems · Dialog state tracking · Contextual self-attention · Learning with noisy labels
1 Introduction Task-oriented dialog systems, as typical human-machine interaction systems, have attracted intensive research interest in recent years. They allow for natural, personalized interactions with users to help them achieve simple goals such as finding restaurants or booking flights. Dialog state tracking (DST) is a core component of the task-oriented dialog system [8, 42]. It estimates the state of a conversation based on the current utterance and the conversational history. In DST, a state is described by a triplet of the form (domain, slot, value). This triplet represents the values of requested slots given an active domain. A turn state is the Xuejun Zhang
[email protected] Tian Tan [email protected] 1
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing, 100190 China
2
Alibaba Group, Beijing, 100102 China
3
Stanford University, 450 Serra Mall, Stanford, CA 94305, USA
specific value of the current utterance. A joint goal is the set of all turn states accumulated across conversational turns. DST aims to track the joint goal of a dialog. Figure 1 shows an example of a dialog with an annotated dialog state, in which the user first books train tickets, and then considers attractions. For the multi-domain dialog state tracking problem, we assume that there are M domains. D = {d1 , d2 , ..., dM }. Taking the MultiWOZ 2.0 dataset as an example, there are seven domains, namely, hotel, taxi, restaurant, attraction, train, police
Data Loading...