Self-Play
This chapter is devoted to AlphaGo-style self-play. Self-play is an intuitively appealing AI method that has long been used by AI researchers in various forms, as we saw at the end of the previous chapter.
- PDF / 19,149,187 Bytes
- 335 Pages / 439.42 x 683.15 pts Page_size
- 23 Downloads / 174 Views
Learning to Play
Reinforcement Learning and Games
Learning to Play
Aske Plaat
Learning to Play Reinforcement Learning and Games
Aske Plaat Leiden Institute of Advanced Computer Science Leiden University Leiden, The Netherlands
ISBN 978-3-030-59237-0 ISBN 978-3-030-59238-7 (eBook) https://doi.org/10.1007/978-3-030-59238-7 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To my students
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Tuesday March 15, 2016 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 3 5
2
Intelligence and Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Games of Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Game Playing Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3
Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Markov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Policy Function, State-Value Function, Action-Value Function . . . . . 3.3 Solution Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.
Data Loading...