Evaluation

There are many paradigms available for designing a dialogue manager, each claiming various advantages. Four major approaches—finite state systems, Bayesian networks, MDPs and POMDPs—were described in the introduction. These approaches must be evaluated to

  • PDF / 497,277 Bytes
  • 11 Pages / 439.37 x 666.142 pts Page_size
  • 13 Downloads / 208 Views

DOWNLOAD

REPORT


Evaluation

There are many paradigms available for designing a dialogue manager, each claiming various advantages. Four major approaches—finite state systems, Bayesian networks, MDPs and POMDPs—were described in the introduction. These approaches must be evaluated to establish which performs best in real world interactions. The purpose of this chapter is to discuss such an evaluation. The evaluation will focus on one example domain, TownInfo. This domain has been used for examples throughout the thesis, in Sects. 3.3, 5.3, 5.5 and 5.8. The task in TownInfo is to provide users with tourist information in a fictitious town, called Jasonville. The same domain has been used previously for evaluating the Hidden Information State model and the DIPPER system (Young et al. 2009; Lemon et al. 2006). The chapter starts with the discussion of the four example systems which are built for evaluation. A comparison of the four systems using a user simulator is given in Sect. 6.2, which is followed by a comparison obtained from human users in a user trial. A discussion of the effects of noise on user trial performance concludes the chapter.

6.1 TOWNINFO Systems Four systems will be built for TownInfo, giving implementations of each of the four major paradigms. The four systems to be evaluated are • mdp-hdc—A finite state system with hand-crafted state transitions and handcrafted policy. • buds-hdc—A partially observable system with state transitions defined via probability rules and hand-crafted policy decisions. The system uses the techniques for Bayesian Updates of Dialogue State (BUDS) developed in Chaps. 3 and 4. • mdp-tra—A finite state system with hand-crafted state transitions and a learned policy, using the Markov decision process model. B. Thomson, Statistical Methods for Spoken Dialogue Management, Springer Theses, DOI: 10.1007/978-1-4471-4923-1_6, © Springer-Verlag London 2013

71

72

6 Evaluation

• buds-tra—A partially observable Markov decision process system. The system uses the techniques for Bayesian Updates of Dialogue State (BUDS) developed in Chaps. 3 and 4, as well as the policy learning techniques of Chap. 5.

6.1.1 Hand-Crafted State Transitions The MDP-TRA and MDP-HDC systems use a hand-crafted state transition model with a concept-based architecture. A list of the available concept-values is given in Table 6.1. The most recent value mentioned for each concept is stored along with a grounding state which stores whether the concept is unknown, known or grounded. In a given turn, the most likely semantics from the user is separated according to the concepts and is used to update the concept’s value with whatever the user has given as their constraint. When the system asks for confirmation of the value of a concept and the user affirms it then the concept transitions to the grounded state. Requests for information are handled by storing a list of the requested concepts. Concepts are added to this list whenever the user requests the concept’s value and removed when the system tells the user the value for a p