How to run the ML-Agents sample

3 minute read


This is an article that summarizes how to learn and operate the sample model of ML-Agent. Think of it as a rough flow of notes. Use Anaconda, pip, etc. Please make these settings yourself. Please refer to Previous article for detailed environment.


  • macOS Catalina 10.15.5
  • Anaconda 4.8.3
  • python 3.8.3
  • pip 20.1.1

3D Ball model

This time, we will explain the learning method etc. using the model called 3D Ball in the sample of ML-Agents. 3D Ball is a model that prevents the ball on the box from dropping.

This is a continuation of the previous article (

Open Assets / ML-Agents / Examples / 3DBall / Scenes at the bottom left and double-click the model marked in red. It should look like the image below.
スクリーンショット 2020-09-05 21.40.10.png
If you press the play button marked in red, it will look like the image below. Since the sample model has already been trained, you can see that it is moving so as not to drop the ball.
Next, I will explain how to learn the model.

Prepare an environment for learning

First, create an Anaconda virtual environment to train your model.
ML-Agents requires python 3.6.1 or later. This time we will use python 3.8.3.

$ conda create -n mlagents python=3.8.3 anaconda
$ conda activate mlagents

Next, install the python packages ml-agents and ml-agents-env.
Go to the previously downloaded ml-agents-release_6 and run the following command.

cd ml-agents-envs
pip3 install -e ./
cd ..
cd ml-agents
pip3 install -e ./
cd ..

Now the environment for training the model is ready.

Train the model

Next, let’s train 3D Ball.
There are three things to do.

–Set hyperparameters.
–Run the program in Anaconda’s virtual environment.
–Run the unity model.

Set hyperparameters

Set the learning algorithm and parameters with ml-agents-release_6 / config.
There are folders called ppo and sac. Each is a reinforcement learning algorithm.

  • PPO : Proximal Policy Optimization
  • SAC : Soft Actor-Critic

I will use ppo for the time being. Inside this file is a YAML file that sets the parameters for ppo. You can freely set hyperparameters by creating a YAML file. The parameters of 3DBall are already set in 3DBall.yaml, so use them as they are.

Run the program in Anaconda’s virtual environment

Run the python script you want to learn. In the virtual environment created earlier, go to ml-agents-release_6 / ml-agents and execute the following command.

mlagents-learn ../config/ppo/3DBall.yaml --run-id=3DBall --train

–run-id can be anything. When executed, the unity logo will appear as shown in the following image.
スクリーンショット 2020-09-06 2.24.42.png

Run unity model

Then run the model on unity. All you have to do is press the run button. As the learning progresses, it will look like the image below.
スクリーンショット 2020-09-06 3.15.52.png
“Mean Reward” is the average reward. The maximum value of 3D Ball is 100. You can see that it is the maximum from around 160,000 steps. “Std of ReWard” is the standard deviation of the reward. The smaller it is, the better. You can see that the learning is completed around 200,000 steps.

The behavior of 10,000 steps and 150,000 steps is shown below. You can see that you are learning not to drop the ball.

10,000 steps
150,000 steps

Post-learning model

The trained model will have a folder in ml-agents / results with the name –run-id specified earlier. The 3DBall.nn in that folder is the trained model. If you want to use the newly learned model, add 3DBall.nn to Assets / ML-Agents / Example / 3DBall / TFModels of unity and select 3DBall.nn in Model of Behavior Parameters of Agent.


I wrote an article about how to run the sample model of ML-Agents. The learning method with ML-Agents basically follows this flow. Please use it as a reference as it may be wrong. There is also an official Documents.

Next time would like to write an article to create a model of Cartpole and train it.

The site that I was allowed to refer to

Unity ML-Agents 0.15.0 Tutorial (1)