Model types, Algorithms and approaches, Function approximation, Deep reinforcement-learning, Deep Multi-agent Reinforcement
Free tutorial
Rating: 1.0 out of 51.0 (4 ratings)
2,176 students
42min of on-demand video
Created by Nitsan Soffair
English
English [Auto]
What you’ll learn
- Being able to start Deep reinforcement-learning research
- Being able to start a Deep reinforcement-learning engineering role
- Understand modern state-of-the-art Deep reinforcement-learning knowledge
- Understand Deep reinforcement-learning knowledge
Requirements
- Interest in Deep reinforcement-learning
Description
Hello I am Nitsan Soffair, A Deep RL researcher at BGU.
In my Deep reinforcement-learning course you will learn the newest state-of-the-art Deep reinforcement-learning knowledge.
You will do the following
- Get state-of-the-art knowledge regarding
- Model types
- Algorithms and approaches
- Function approximation
- Deep reinforcement-learning
- Deep Multi-agent Reinforcement-learning
- Validate your knowledge by answering short and very short quizzes of each lecture.
- Be able to complete the course by ~2 hours.
Syllabus
- Model types
- Markov decision process (MDP)A discrete-time stochastic control process.
- Partially observable Markov decision process (POMDP)A generalization of MDP in which an agent cannot observe the state.
- Decentralized Partially observable Markov decision process (Dec-POMDP)A generalization of POMDP to consider multiple decentralized agents.
- Algorithms and approaches
- Bellman equationsA condition for optimality of optimization of dynamic programming.
- Model-freeA model-free algorithm is an algorithm which does not use the policy of the MDP.
- Off-policyAn off-policy algorithm is an algorithm that use policy 1 for learning and policy 2 for acting in the environment.
- Exploration-exploitationA trade-off in Reinforcement-learning between exploring new policies to use existing policies.
- Value-iterationAn iterative algorithm applying bellman optimality backup.
- SARSAAn algorithm for learning a Markov decision process policy
- Q-learningA model-free reinforcement learning algorithm to learn the value of an action in a particular state.
- Function approximation
- Function approximatorsThe problem asks us to select a function among a well-defined class that closely matches (“approximates”) a target function in a task-specific way.
- Policy-gradientValue-based, Policy-based, Actor-critic, policy-gradient, and softmax policy
- REINFORCEA policy-gradient algorithm.
- Deep reinforcement-learning
- Deep Q-Network (DQN)A deep reinforcement-learning algorithm using experience reply and fixed Q-targets.
- Deep Recurrent Q-Learning (DRQN)Deep reinforcement-learning algorithm for POMDP extends DQN and uses LSTM.
- Optimistic Exploration with Pessimistic Initialization (OPIQ)A deep reinforcement-learning for MDP based on DQN.
- Value Decomposition Networks (VDN)A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.
- QMIXA multi-agent deep reinforcement-learning algorithm for Dec-POMDP.
- QTRANA multi-agent deep reinforcement-learning algorithm for Dec-POMDP.
- Weighted QMIXA deep multi-agent reinforcement-learning for Dec-POMDP.
Resources
- Wikipedia
- David Silver’s Reinforcement-learning course
Who this course is for:
- Anyone who interests in Deep reinforcement-learning
Show less
Course content
6 sections • 23 lectures • 41m total lengthCollapse all sections
Model types3 lectures • 3min
- Markov decision process (MDP)00:53
- Markov decision process (MDP)3 questions
- Partially observable markov decision process (POMDP)01:18
- Partially observable markov decision process (POMDP)2 questions
- Decentralized partially observable markov decision process (Dec-POMDP)00:57
- Decentralized partially observable markov decision process (Dec-POMDP)1 question
Algorithms and approaches7 lectures • 5min
- Bellman equations00:47
- Bellman equations3 questions
- Model free00:19
- Model free2 questions
- Off-policy00:19
- Off-policy2 questions
- Exploration-exploitation00:47
- Exploration-exploitation3 questions
- Value-iteration00:54
- Value-iteration3 questions
- SARSA01:13
- SARSA3 questions
- Q-learning00:54
- Q-learning3 questions
Function approximation3 lectures • 3min
- Function approximators00:26
- Function approximators3 questions
- Policy gradient01:34
- Policy gradient3 questions
- REINFORCE01:08
- REINFORCE3 questions
Deep reinforcement-learning3 lectures • 5min
- Deep Q-Network (DQN)01:19
- Deep Q-Network (DQN)3 questions
- Deep Recurrent Q-Learning (DRQN)01:02
- Deep Recurrent Q-Learning (DRQN)3 questions
- Optimistic Exploration with Pessimistic Initialization (OPIQ)02:29
- Optimistic Exploration with Pessimistic Initialization (OPIQ)3 questions
Deep Multi-agent Reinforcement-learning4 lectures • 5min
- Value Decomposition Networks (VDN)01:17
- Value Decomposition Networks (VDN)3 questions
- QMIX01:04
- QMIX3 questions
- QTRAN01:46
- QTRAN3 questions
- Weighted QMIX01:17
- Weighted QMIX3 questions
Extra content3 lectures • 20min
- GPT-307:03
- DALL-E05:09
- CLIP07:37