オッサンはDesktopが好き

自作PCや機械学習、自転車のことを脈絡無く書きます

AI bike racer learned observation, goal sprint, and break

This article is a translation of Japanese ver.
Original ver. is here*1.

 Hi, this is chang. Today I tried to make two agents of artificial intelligence learn bike road race through interactive competitions.

0. Bike road race

 Previously, I wrote that goal sprint using slipstream is a typical strategy for winning bike road races*2. Now I added some keywords for reading this article.

Break

 Break is a tactics that one accelerates from the begging of a race and try to keep the gap till the end of the race. It is often called as "kamikaze" because of its risk

Observe

 Riders often observe each other and wait for their goal sprint till just before a goal line. It is the strategy for letting rivals go sprint before and using slipstream. At the same time, it avoids rivals not to use slipstream by defending own back. Bike race is a mental game.

1. Program

 I mainly introduced the differences from the previous codes.

(1) Field

 Like the previous, race is competed on the image with 32 × 8. I added the remaining energy bar for competitor.

f:id:changlikesdesktop:20210218050921p:plain:w400
field

(2) Player(M*cEwan)

 I had not changed the rules. Note that today we call the player as "M*cEwan."

f:id:changlikesdesktop:20210123064303p:plain:w400
Player. We call it as M*cEwan

  • actions of 4 patterns
  • 1 sprint consumes 1 energy
  • If remaining energy is 0, sprint(0) is the same to go straight(2)
  • Initial energy is 5
  • Big motion with 2 pixels to the lateral directions are set for generating the tactics of chasing the back of rivals.
  • M*cEwan type: weak in long sprint(=small initial energy) but good at quick action(=big lateral move)

(3) Competitor(P*tacchi)

 It is almost the same to the pervious. One difference is deciding actions using Q value not random value. Now we call it as "P*tacchi."

f:id:changlikesdesktop:20210123064650p:plain:w400
Competitor. We call it as P*tacchi

  • actions of 4 patterns(randomly selected)
  • 1 sprint consumes the 1 energy
  • If remaining energy is 0, sprint(0) is the same to go straight(2)
  • Initial energy is 6
  • P*tacchi type: Large body(=small lateral move) and good at long sprint(=large initial energy)

Note: I changed the reactions to the left and right edge(=wall) because players tended to stick to the walls and neglect lateral motions. I connected the both edge like ring. If a player tries to move against the walls, he will fly to the opposite. I had tested a program to stop players if he collided to the walls but it did not work well. For players it was so hard to learn both avoiding crash and sprinting against rivals.

(4) Slipstream

 I imitated slipstream using simple rules. The slipstream previously worked only on player but today works on both player and competitor.

f:id:changlikesdesktop:20210123065708p:plain:w400
Imitated slipstream

(5) Reward

  • Win: 1.0
  • Lose: -1.0

 I used the all or nothing concept like the previous. If drawn, rewad is the same to lose(=-1.0).

2. Result

 There were great valuation in the results because the player and competitor interactively learned. So I comprehensively analyzed the 15 cases obtained with the same learning conditions.

Case M*cEwan(Player) P*tacchi Win count
0 break goal sprint 41
1 / goal sprint 2
2 break goal sprint 30
3 observe observe 63
4 goal sprint break 68
5 observe observe 30
6 observe / 89
7 observe own pace 37
8 observe own pace 29
9 / own pace 5
10 / break 3
11 observe own pace 20
12 observe observe 40
13 break goal sprint 50
14 break own pace 0

 The table above shows the results. The win count shown at the right edge are the count of M*cEwan's wins in 100 races using obtained neural network. In total, M*cEwan won 507 in 1500 races. The result shows that P*tacchi, whose initial energy is larger, was quite advantageous.

 I showed gif animations of typical patterns.

f:id:changlikesdesktop:20210217061054g:plain:w500
Case 3. M*cEwan(yellow)' signature: Using slipstream and attacking from the back of rivals

f:id:changlikesdesktop:20210217061724g:plain:w500
Case 11. P*tacchi(blue)'s signature: Keep own pace without lateral meandering.

f:id:changlikesdesktop:20210217062021g:plain:w500
Case 13. P*tacchi(blue) caught M*cEwan(yellow) in break away

f:id:changlikesdesktop:20210218053707g:plain:w500
Case 14. P*tacchi(blue) completely defeated M*cEwan(yellow) in hesitated break away

 Here, I picked good examples. If you want to whole results, please look at *3.

2. 1 Strategy

 I analyzed the strategy of M*cEwan and P*tacchi, respectively.

 M*cEwan selected goal sprint after observing in 7 cases. M*cEwan had to let rivals go and make use of slipstream, because his power(=initial energy) was smaller than the rival. Robbie MacEwan in real was also a clever rider who was good to use rivals' efforts.

 On the other side, P*tacchi often selected "own pace strategy:" go straight without meandering and sprint after the middle of races. In other words, he sprinted from his distance without observing rival's move. In today's game, if the both used their energies up and reached the goal without slipstream, P*tacchi was necessary to win. For P*tacchi, all have to do is refraining dash in the early part of races to make M*cEwan not take his wheel. I did not noticed it when I wrote the codes. To be honest, I felt like to be pointed bugs out by AI, ha ha...

2. 2 Break

 It was a little surprise that both M*cEwan and P*tacchi selected break in multiple cases. In case 1 and 10, P*tacchi won with a probability close to 100% because M*cEwan gave up. I wonder if M*cEwan could not learn the risk of rival's break...

 In case 0, 2, and 13, in which M*cEwan went break, the number of wins for both sides tended to be imminent. For M*cEwan, going break for getting 50/50 wining rate could be valuable because of his smaller power. I think it's like a puncher-type rider who isn't suitable for a bunch sprint and tends to escape.

 In total, I can say that ”break is valuable because rivals can be dull, especially if you have sustainable power.”

 Note that break with hesitation is the worst. Please look at the gif animation of case 14. In this case, M*cEwan could not get a single win. It shows that "if you attack, do not look behind."

3. Afterward

 It was interesting. Next time, I will try more complex competition with increased players.

 I updated the source code*4.