JuliaReinforcementLearning
diff --git a/‎.github/workflows/ci.yml‎
Lines changed: 8 additions & 13 deletions b/‎.github/workflows/ci.yml‎
Lines changed: 8 additions & 13 deletions
diff --git a/‎NEWS.md‎
Lines changed: 1 addition & 1 deletion b/‎NEWS.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 2 additions & 2 deletions b/‎README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/homepage/blog/an_introduction_to_reinforcement_learning_jl_design_implementations_thoughts/index.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/homepage/blog/an_introduction_to_reinforcement_learning_jl_design_implementations_thoughts/index.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/make.jl‎
Lines changed: 1 addition & 1 deletion b/‎docs/make.jl‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/src/How_to_implement_a_new_algorithm.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/src/How_to_implement_a_new_algorithm.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/src/How_to_write_a_customized_environment.md‎
Lines changed: 6 additions & 10 deletions b/‎docs/src/How_to_write_a_customized_environment.md‎
Lines changed: 6 additions & 10 deletions
diff --git a/‎docs/src/rlbase.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/src/rlbase.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/src/rlcore.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/src/rlcore.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/src/rlenvs.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/src/rlenvs.md‎
Lines changed: 1 addition & 1 deletion
@@ -204,19 +204,11 @@ jobs:
       - uses: actions/checkout@v4
         with:
           fetch-depth: 0
-      - name: Get changed files
-        id: documentation-changed
-        uses: tj-actions/changed-files@v42
-        with:
-          files: |
-            docs/**
-      - if: (steps.documentation-changed.outputs.any_changed == 'true')
-        run: python -m pip install --user matplotlib
+      - run: python -m pip install --user matplotlib
       - uses: julia-actions/setup-julia@v1
         with:
           version: "1"
       - name: Build homepage
-        if: (steps.documentation-changed.outputs.any_changed == 'true')
         run: |
           cd docs/homepage
           julia --project --color=yes -e '
@@ -225,7 +217,6 @@ jobs:
             using Franklin;
             optimize()' > build.log
       - name: Make sure homepage is generated without error
-        if: (steps.documentation-changed.outputs.any_changed == 'true')
         run: |
           if grep -1 "Franklin Warning" build.log; then
             echo "Franklin reported a warning"
@@ -234,12 +225,16 @@ jobs:
             echo "Franklin did not report a warning"
           fi
       - name: Build docs
-        if: (steps.documentation-changed.outputs.any_changed == 'true')
         run: |
           cd docs
           julia --project --color=yes -e '
-            using Pkg; Pkg.instantiate();
-            include("make.jl")'
+            using Pkg; Pkg.instantiate()
+            Pkg.develop(path="../src/ReinforcementLearningBase")
+            Pkg.develop(path="../src/ReinforcementLearningCore")
+            Pkg.develop(path="../src/ReinforcementLearningEnvironments")
+            Pkg.develop(path="../") # ReinforcementLearning meta-package
+            Pkg.develop(path="../src/ReinforcementLearningFarm")
+            include("make.jl")' skiplinks # Temporarily skip broken link checks
           mv build homepage/__site/docs
       - name: Deploy to the main repo
         uses: peaceiris/actions-gh-pages@v3
 
@@ -77,7 +77,7 @@
 
 - Add `StockTradingEnv` from the paper [Deep Reinforcement Learning for
   Automated Stock Trading: An Ensemble
-  Strategy](https://github.com/AI4Finance-LLC/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020).
+  Strategy](https://github.com/AI4Finance-Foundation/FinRL-Trading).
   This environment is a good testbed for multi-continuous action space
   algorithms. [#428](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/pull/428)
 
 
@@ -54,9 +54,9 @@ The above simple example demonstrates four core components in a general
 reinforcement learning experiment:
 
 - **Policy**. The
-  [`RandomPolicy`](https://juliareinforcementlearning.org/docs/rlcore/#ReinforcementLearningCore.RandomPolicy)
+  [`RandomPolicy`](https://juliareinforcementlearning.github.io/docs/rlcore/#ReinforcementLearningCore.RandomPolicy)
   is the simplest instance of
-  [`AbstractPolicy`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.AbstractPolicy).
+  [`AbstractPolicy`](https://juliareinforcementlearning.github.io/docs/rlbase/#ReinforcementLearningBase.AbstractPolicy).
   It generates a random action at each step.
 
 - **Environment**. The
 
@@ -62,7 +62,7 @@ Although most existing reinforcement learning related packages are written in Py
 Many existing packages inspired the development of ReinforcementLearning.jl a lot. Following are some important ones.
 
 - [Dopamine](https://google.github.io/dopamine/)\dcite{dayan2009dopamine} provides a clear implementation of the **Rainbow**\dcite{hessel2018rainbow} algorithm. The [gin](https://github.com/google/gin-config) config file template and the concise workflow is the origin of the `Experiment` in ReinforcementLearning.jl.
-- [OpenSpiel](https://github.com/deepmind/open_spiel)\dcite{LanctotEtAl2019OpenSpiel} provides a lot of useful functions to describe many different kinds of games. They are turned into traits in our package.
+- [OpenSpiel](https://github.com/google-deepmind/open_spiel)\dcite{LanctotEtAl2019OpenSpiel} provides a lot of useful functions to describe many different kinds of games. They are turned into traits in our package.
 - [Ray/rllib](https://docs.ray.io/en/master/rllib.html)\dcite{liang2017ray} has many nice abstraction layers in the policy part. We also borrowed the definition of environments here. This is explained with details in section 2.
 - [rlpyt](https://github.com/astooke/rlpyt)\dcite{stooke2019rlpyt} has a nice code structure and we borrowed some implementations of policy gradient algorithms from it.
 - [Acme](https://github.com/deepmind/acme)\dcite{hoffman2020acme} offers a framework for distributed reinforcement learning.
 
@@ -22,7 +22,7 @@ makedocs(
         ReinforcementLearning,
         ReinforcementLearningBase,
         ReinforcementLearningCore,
-        ReinforcementLearningEnvironments,
+        ReinforcementLearningEnvironments
     ],
     format = Documenter.HTML(
         prettyurls = true,
 
@@ -45,7 +45,7 @@ end
 Implementing a new algorithm mainly consists of creating your own `AbstractPolicy` (or `AbstractLearner`, see [this section](#using-resources-from-rlcore)) subtype, its action sampling method (by overloading `Base.push!(policy::YourPolicyType, env)`) and implementing its behavior at each stage. However, ReinforcemementLearning.jl provides plenty of pre-implemented utilities that you should use to 1) have less code to write 2) lower the chances of bugs and 3) make your code more understandable and maintainable (if you intend to contribute your algorithm). 
 
 ## Using Agents
-The recommended way is to use the policy wrapper `Agent`. An agent is itself an `AbstractPolicy` that wraps a policy and a trajectory (also called Experience Replay Buffer in reinforcement learning literature). Agent comes with default implementations of `push!(agent, stage, env)` and `plan!(agent, env)` that will probably fit what you need at most stages so that you don't have to write them again. Looking at the [source code](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/main/src/ReinforcementLearningCore/src/policies/agent.jl/), we can see that the default Agent calls are  
+The recommended way is to use the policy wrapper `Agent`. An agent is itself an `AbstractPolicy` that wraps a policy and a trajectory (also called Experience Replay Buffer in reinforcement learning literature). Agent comes with default implementations of `push!(agent, stage, env)` and `plan!(agent, env)` that will probably fit what you need at most stages so that you don't have to write them again. Looking at the [source code](https://github.com/JuliaReinforcementLearning/ReinforcementLearning.jl/blob/main/src/ReinforcementLearningCore/src/policies/agent/agent_base.jl), we can see that the default Agent calls are  
 
 ```julia
 function Base.push!(agent::Agent, ::PreEpisodeStage, env::AbstractEnv)
 
@@ -6,7 +6,7 @@ write many different kinds of environments based on interfaces defined in
 [ReinforcementLearningBase.jl](@ref).
 
 The most commonly used interfaces to describe reinforcement learning tasks is
-[OpenAI/Gym](https://gym.openai.com/). Inspired by it, we expand those
+[OpenAI/Gym](https://gymnasium.farama.org). Inspired by it, we expand those
 interfaces a little to utilize multiple-dispatch in Julia and to cover
 multi-agent environments.
 
@@ -30,7 +30,7 @@ act!(env::YourEnv, action)
 ## An Example: The LotteryEnv
 
 Here we use an example introduced in [Monte Carlo Tree Search: A
-Tutorial](https://www.informs-sim.org/wsc18papers/includes/files/021.pdf) to
+Tutorial](https://ieeexplore.ieee.org/document/8632344) to
 demonstrate how to write a simple environment.
 
 The game is defined like this: assume you have \$10 in your pocket, and you are
@@ -168,7 +168,7 @@ policy we defined above. A [`QBasedPolicy`](@ref)
 contains two parts: a `learner` and an `explorer`. The `learner` *learn* the
 state-action value function (aka *Q* function) during interactions with the
 `env`. The `explorer` is used to select an action based on the Q value returned
-by the `learner`. Inside of the [`MonteCarloLearner`](@ref), a
+by the `learner`. Inside of the [`TDLearner`](@ref), a
 [`TabularQApproximator`](@ref) is used to estimate the Q value.
 
 That's the problem! A [`TabularQApproximator`](@ref) only accepts states of type `Int`.
@@ -304,11 +304,7 @@ legal_action_space_mask(ttt)
 ```
 
 For some simple environments, we can simply use a `Tuple` or a `Vector` to
-describe the action space. A special space type [`Space`](@ref) is also provided
-as a meta space to hold the composition of different kinds of sub-spaces. For
-example, we can use `Space(((1:3),(true,false)))` to describe the environment
-with two kinds of actions, an integer between `1` and `3`, and a boolean.
-Sometimes, the action space is not easy to be described by some built in data
+describe the action space. Sometimes, the action space is not easy to be described by some built in data
 structures. In that case, you can defined a customized one with the following
 interfaces implemented:
 
@@ -370,7 +366,7 @@ to the perspective from the `current_player(env)`.
 
 In multi-agent environments, sometimes the sum of rewards from all players are
 always `0`. We call the [`UtilityStyle`](@ref) of these environments [`ZeroSum`](@ref).
-`ZeroSum` is a special case of [`ConstantSum`](@ref). In cooperational games, the reward
+`ZeroSum` is a special case of [`ConstantSum`](@ref). In cooperative games, the reward
 of each player are the same. In this case, they are called [`IdenticalUtility`](@ref).
 Other cases fall back to [`GeneralSum`](@ref).
 
@@ -403,7 +399,7 @@ each action, then we call the [`ChanceStyle`](@ref) of these environments are of
 default return value. One special case is that,
 in [Extensive Form Games](https://en.wikipedia.org/wiki/Extensive-form_game), a
 chance node is involved. And the action probability of this special player is
-determined. We define the `ChanceStyle` of these environments as [`EXPLICIT_STOCHASTIC`](https://juliareinforcementlearning.org/docs/rlbase/#ReinforcementLearningBase.EXPLICIT_STOCHASTIC).
+determined. We define the `ChanceStyle` of these environments as [`EXPLICIT_STOCHASTIC`](@ref).
 For these environments, we need to have the following methods defined:
 
 ```@repl customized_env
 
@@ -2,4 +2,4 @@
 
 ```@autodocs
 Modules = [ReinforcementLearningBase]
-```
+```
@@ -8,8 +8,8 @@ In addition to containing the [run loop](./How_to_implement_a_new_algorithm.md),
 
 ## QBasedPolicy
 
-`QBasedPolicy` is an `AbstractPolicy` that wraps a Q-Value _learner_ (tabular or approximated) and an _explorer_. Use this wrapper to implement a policy that directly uses a Q-value function to 
-decide its next action. In that case, instead of creating an `AbstractPolicy` subtype for your algorithm, define an `AbstractLearner` subtype and specialize `RLBase.optimise!(::YourLearnerType, ::Stage, ::Trajectory)`. This way you will not have to code the interaction between your policy and the explorer yourself. 
+[`QBasedPolicy`](@ref) is an [`AbstractPolicy`](@ref) that wraps a Q-Value _learner_ (tabular or approximated) and an _explorer_. Use this wrapper to implement a policy that directly uses a Q-value function to 
+decide its next action. In that case, instead of creating an [`AbstractPolicy`](@ref) subtype for your algorithm, define an [`AbstractLearner`](@ref) subtype and specialize `RLBase.optimise!(::YourLearnerType, ::Stage, ::Trajectory)`. This way you will not have to code the interaction between your policy and the explorer yourself. 
 RLCore provides the most common explorers (such as epsilon-greedy, UCB, etc.). You can find many examples of QBasedPolicies in the DQNs section of RLZoo.
 
 ## Parametric approximators 
@@ -29,4 +29,4 @@ The other advantage of `TargetNetwork` is that it uses Julia's multiple dispatch
 
 ## Architectures
 
-Common model architectures are also provided such as the `GaussianNetwork` for continuous policies with diagonal multivariate policies; and `CovGaussianNetwork` for full covariance (very slow on GPUs at the moment).
+Common model architectures are also provided such as the `GaussianNetwork` for continuous policies with diagonal multivariate policies; and `CovGaussianNetwork` for full covariance (very slow on GPUs at the moment).
@@ -43,7 +43,7 @@
 </ol>
 ```
 
-**Note**: Many traits are *borrowed* from [OpenSpiel](https://github.com/deepmind/open_spiel).
+**Note**: Many traits are *borrowed* from [OpenSpiel](https://github.com/google-deepmind/open_spiel).
 
 ## 3-rd Party Environments