This article is a translation of Japanese ver.
Original ver. is here*1.
Hi, this is chang. Today I tried to make two agents of artificial intelligence learn bike road race through interactive competitions.
0. Bike road race
Previously, I wrote that goal sprint using slipstream is a typical strategy for winning bike road races*2. Now I added some keywords for reading this article.
Break
Break is a tactics that one accelerates from the begging of a race and try to keep the gap till the end of the race. It is often called as "kamikaze" because of its risk
Observe
Riders often observe each other and wait for their goal sprint till just before a goal line. It is the strategy for letting rivals go sprint before and using slipstream. At the same time, it avoids rivals not to use slipstream by defending own back. Bike race is a mental game.
1. Program
I mainly introduced the differences from the previous codes.
(1) Field
Like the previous, race is competed on the image with 32 × 8. I added the remaining energy bar for competitor.
(2) Player(M*cEwan)
I had not changed the rules. Note that today we call the player as "M*cEwan."
- actions of 4 patterns
- 1 sprint consumes 1 energy
- If remaining energy is 0, sprint(0) is the same to go straight(2)
- Initial energy is 5
- Big motion with 2 pixels to the lateral directions are set for generating the tactics of chasing the back of rivals.
- M*cEwan type: weak in long sprint(=small initial energy) but good at quick action(=big lateral move)
(3) Competitor(P*tacchi)
It is almost the same to the pervious. One difference is deciding actions using Q value not random value. Now we call it as "P*tacchi."
- actions of 4 patterns(randomly selected)
- 1 sprint consumes the 1 energy
- If remaining energy is 0, sprint(0) is the same to go straight(2)
- Initial energy is 6
- P*tacchi type: Large body(=small lateral move) and good at long sprint(=large initial energy)
Note: I changed the reactions to the left and right edge(=wall) because players tended to stick to the walls and neglect lateral motions. I connected the both edge like ring. If a player tries to move against the walls, he will fly to the opposite. I had tested a program to stop players if he collided to the walls but it did not work well. For players it was so hard to learn both avoiding crash and sprinting against rivals.
(4) Slipstream
I imitated slipstream using simple rules. The slipstream previously worked only on player but today works on both player and competitor.
(5) Reward
- Win: 1.0
- Lose: -1.0
I used the all or nothing concept like the previous. If drawn, rewad is the same to lose(=-1.0).
2. Result
There were great valuation in the results because the player and competitor interactively learned. So I comprehensively analyzed the 15 cases obtained with the same learning conditions.
Case | M*cEwan(Player) | P*tacchi | Win count |
---|---|---|---|
0 | break | goal sprint | 41 |
1 | / | goal sprint | 2 |
2 | break | goal sprint | 30 |
3 | observe | observe | 63 |
4 | goal sprint | break | 68 |
5 | observe | observe | 30 |
6 | observe | / | 89 |
7 | observe | own pace | 37 |
8 | observe | own pace | 29 |
9 | / | own pace | 5 |
10 | / | break | 3 |
11 | observe | own pace | 20 |
12 | observe | observe | 40 |
13 | break | goal sprint | 50 |
14 | break | own pace | 0 |
The table above shows the results. The win count shown at the right edge are the count of M*cEwan's wins in 100 races using obtained neural network. In total, M*cEwan won 507 in 1500 races. The result shows that P*tacchi, whose initial energy is larger, was quite advantageous.
I showed gif animations of typical patterns.
Here, I picked good examples. If you want to whole results, please look at *3.
2. 1 Strategy
I analyzed the strategy of M*cEwan and P*tacchi, respectively.
M*cEwan selected goal sprint after observing in 7 cases. M*cEwan had to let rivals go and make use of slipstream, because his power(=initial energy) was smaller than the rival. Robbie MacEwan in real was also a clever rider who was good to use rivals' efforts.
On the other side, P*tacchi often selected "own pace strategy:" go straight without meandering and sprint after the middle of races. In other words, he sprinted from his distance without observing rival's move. In today's game, if the both used their energies up and reached the goal without slipstream, P*tacchi was necessary to win. For P*tacchi, all have to do is refraining dash in the early part of races to make M*cEwan not take his wheel. I did not noticed it when I wrote the codes. To be honest, I felt like to be pointed bugs out by AI, ha ha...
2. 2 Break
It was a little surprise that both M*cEwan and P*tacchi selected break in multiple cases. In case 1 and 10, P*tacchi won with a probability close to 100% because M*cEwan gave up. I wonder if M*cEwan could not learn the risk of rival's break...
In case 0, 2, and 13, in which M*cEwan went break, the number of wins for both sides tended to be imminent. For M*cEwan, going break for getting 50/50 wining rate could be valuable because of his smaller power. I think it's like a puncher-type rider who isn't suitable for a bunch sprint and tends to escape.
In total, I can say that ”break is valuable because rivals can be dull, especially if you have sustainable power.”
Note that break with hesitation is the worst. Please look at the gif animation of case 14. In this case, M*cEwan could not get a single win. It shows that "if you attack, do not look behind."
3. Afterward
It was interesting. Next time, I will try more complex competition with increased players.
I updated the source code*4.