The repository contains examples of finite-horizon zero-sum differential games implemented as environments (Markov games) for multi-agent reinforcement learning algorithms. Since the problems are initially described by differential equations, in order to formalize them as Markov games, a uniform time-discretization with the diameter dt is used. In addition, it is important to emphasize that, in the games with a finite horizon, agent's optimal policies depend not only on the phase vector dt, with continuous state space
The finite-horizon zero-sum differential games are implemented as environments (Markov games) with an interface close to OpenAI Gym with the following attributes:
state_dim- the state space dimension;u_action_dim- the action space dimension of the first agent;v_action_dim- the action space dimension of the second agent;terminal_time- the action space dimension;dt- the time-discretization diameter;reset()- to get an initialstate(deterministic);step(u_action, v_action)- to getnext_state, currentreward,done(Trueift > terminal_time, otherwiseFalse),info;virtual_step(state, u_action,v_action)- to get the same as fromstep(action), but but the currentstateis also set.