Why create a custom Open AI?
Today we are using classical Machine Learning and Neural Networks to solve all kinds of problems. Reinforcement Learning is being used to solve games and some industrial applications, but I think this will change pretty soon. Instead of tagging images, we will be creating business environments for our agents to learn and perform in real life. I’ve read some tutorials on how to create an environment but the framing is the difficult part.
The sourcing problem
I wanted to build an environment that had to do something with business. All the companies have sourcing departments. People who buy stuff, either to resell it (as e-commerce) or to operate the business (you need pens, cars, Dunder Mifflin paper…).
When you buy the stuff you have to make decisions. Imagine that you have 5 suppliers, each one has a different price and different reliability. There are very cheap suppliers but not very reliable and there is a premium that always delivers on time. You can find the example on the following table:
In this case, paying a low price carries a high risk. Every supplier sells you 5 articles at a time and you only have $1000 to buy as many articles as you can. The best-case scenario would be that you are super lucky and buy always
Framing the sourcing problem as a Reinforcement Learning problem
Before start writing the code, there are some things we need to define. In my case this takes me more time than writing the actual code:
- Action space: Which actions can your agent take? Here we will have 5 actions, buy from 1 of the 5 suppliers.
- Observation space: This is what our agent will see. In our case, we will let him know how many articles he currently has, how close he is to the goal (50 articles).
- Reward: You need to give a prize to your agent when he does good. Here we will use the following function: (current articles/max_articles:50)^0.9
For me, the trickiest points here are the observation space and the reward function. In the observation space, I tried to read a lot of code from existing environments in Open Ai documentation. In this table, you can find environments and their types so you can look faster. Choosing the reward function is an art, I watched a great youtube video that gave me the idea of choosing this function.
Simple steps to create a custom Open AI Environment
Once you framed the problem in an RL way it starts going downhill.
- You need to create the file structure for your environment. There is great documentation on Open AI. Be careful with the names and make sure you replace everything, this can get cumbersome fast.
- After you’ve done this you need to modify some specific methods:
- __init__: where you initialize all the variables.
- step: whenever your agent takes action, it will call this method. The environment will return a state so the agent can take its next decision.
- reset: Once you’ve reached a terminal state (you are out of money or bought all the articles) you need to reset everything that way the agent can try again.
Creating an agent
To test the environment I used q-learning. Explaining this is outside of the scope of this blogpost. I used the code from this great tutorial, I recommend watching his videos in case you want to learn more about Reinforcement learning. You can find the code here.
I ran the agent with different training episodes and plotted how he performed.
With 1 training episode (almost random):
The lowest action is riskier but the cheapest. In this example, it managed to buy 30 articles.
Using 600 training episodes it managed to solve the problem. It was able to buy 50 articles and use an average approach.
You can find the code on this repository.
I’ll be happy to hear about experiments, ideas or questions on twitter