Sutton and Barto Book Implementation

1 month ago 2

$ python setup.py install

This repository contains code that implements algorithms and models from Sutton's book on reinforcement learning. The book, titled "Reinforcement Learning: An Introduction," is a classic text on the subject and provides a comprehensive introduction to the field.

The code in this repository is organized into several modules, each of which covers differents topics.

Multi Armed Bandits
- Epsilon Greedy
- Optimistic Initial Values
- Gradient
- α (non stationary)
Model Based
- Policy Evaluation
- Policy Iteration
- Value Iteration
Monte Carlo estimation and control
- First-visit α-MC
- Every-visit α-MC
- MC with Exploring Starts
- Off-policy MC, ordinary and weighted importance sampling
Temporal Difference
- TD(n) estimation
- n-step SARSA
- n-step Q-learning
- n-step Expected SARSA
- double Q learning
- n-step Tree Backup
Planning
- Dyna-Q/Dyna-Q+
- Prioritized Sweeping
- Trajectory Sampling
- MCTS
On-policy Prediction
- Gradient MC
- $n$-step semi-gradient TD
- ANN
- Least-Squares TD
- Kernel-based
On-policy Control
- Episodic semi-gradient
- Semi-gradient n-step Sarsa
- Differential Semi-gradient n-step Sarsa
Elegibility Traces
- TD($\lambda$)
- True Online
- Sarsa($\lambda$)
- True Online Sarsa($\lambda$)
Policy Gradient
- REINFORCE: Monte Carlo Policy Gradient w/wo Baseline
- Actor-Critic (episodic) w/wo eligibility traces
- Actor-Critic (continuing) with eligibility traces

All model free solvers will work just by defining states actions and a trasition function. Transitions are defined as a function that takes a state and an action and returns a tuple of the next state and the reward. The transition function also returns a boolean indicating whether the episode has terminated.

states: Sequence[Any] actions: Sequence[Any] transtion: Callable[[Any, Any], Tuple[Tuple[Any, float], bool]]

Single State Infinite Variance Example 5.5

from mypyrl import off_policy_mc, ModelFreePolicy states = [0] actions = ['left', 'right'] def single_state_transition(state, action): if action == 'right': return (state, 0), True if action == 'left': threshold = np.random.random() if threshold > 0.9: return (state, 1), True else: return (state, 0), False b = ModelFreePolicy(actions, states) #by default equiprobable pi = ModelFreePolicy(actions, states) pi.pi[0] = np.array([1, 0]) # calculate ordinary and weighted samples state value functions vqpi_ord, samples_ord = off_policy_mc(states, actions, single_state_transition, policy=pi, b=b, ordinary=True, first_visit=True, gamma=1., n_episodes=1E4) vqpi_w, samples_w = off_policy_mc(states, actions, single_state_transition, policy=pi, b=b, ordinary=False, first_visit=True, gamma=1., n_episodes=1E4)

Monte Carlo Tree Search maze solving plot

s = START_XY budget = 500 cp = 1/np.sqrt(2) end = False max_steps = 50 while not end: action, tree = mcts(s, cp, budget, obstacle_maze, action_map, max_steps, eps=1) (s, _), end = obstacle_maze(s, action) tree.plot()

While the code in this package provides a basic implementation of the algorithms from the book, it is not necessarily the most efficient or well-written. If you have suggestions for improving the code, please feel free to open an issue.

Overall, this package provides a valuable resource for anyone interested in learning about reinforcement learning and implementing algorithms from scratch. By no means prod ready.

Read Entire Article