Go to login Go to sub menu Go to text

Course summary

  • Type MOOC course
  • Period Always open
  • Learning Time Study freely
  • Course approval method Automatic approval
  • Certificate Issue Online
http://kaist.edwith.org/reinforcement-learning2
둘러보기
Thumb up 0 Learner 8

Instructor Introduction

  • KAIST 산업및시스템공학과 신하용 교수님

    교수자 : 신하용 
    2001-현재 : KAIST 산업및시스템공학과 교수
    1991~2001 : LG전자, ㈜큐빅테크, Chrysler(미) 연구원
    대한산업공학회 부회장(저널), 정헌학술대상 수상 (2021)
    한국CDE학회 수석부회장, 가헌학술상 수상 (2002, 2005, 2009)
    Computer-Aided Design 저널 Editorial board member(2005~)

Lecture plan

강의
  1. 8. Deep Q Network
    1. Neural net
    1. NN for RL
    1. DQN
    1. DQN 개선
    1. Quiz 8
  2. 9. Policy based RL : Stochastic Policy Gradient
    1. Policy based RL
    1. Policy gradient theorem
    1. Policy gradient algorithms
    1. Quiz 9
  3. 10. Policy based RL : TRPO, PPO
    1. Revisiting policy gradient
    1. Trust region policy optimization (TRPO) algorithm
    1. Proximal Policy Optimization (PPO) algorithm
    1. Quiz 10
  4. 11. Policy based RL : DPG, DDPG, CEM
    1. Theoretical foundation of DPG
    1. DPG & DDPG algorithms
    1. Derivative free method and CEM
    1. Quiz 11
  5. 12. Exploration vs Exploration
    1. Multi-Armed Bandit problem
    1. Basic MAB algorithm
    1. Advanced MAB algorithms
    1. Quiz 12
  6. 13. Average reward MDP and finite horizon MDP
    1. Average reward RL
    1. Finite horizon MDP
    1. Finite horizon MDP examples
    1. Quiz 13
  7. 14. AlphaGo & Reward shaping
    1. Components of AlphaGo
    1. Training AlphaGo and MCTS
    1. AlphaGo Zero and next
    1. Reward shaping
    1. Quiz 14