How Ratings Get Calculated for Halite III


Bot ratings are computed using the TrueSkill Python library, a Bayesian algorithm variant of the Glicko system. TrueSkill is a rating system developed by Microsoft for use in video game matchmaking. It is similar to the Elo rating system for chess, but supports games with more than two players. TrueSkill selects groups of 2 or 4 bots to compete against each other in a game of Halite III, and the algorithm updates the bot rating based on the match outcome.

TrueSkill will match players of similar skill. TrueSkill often matches newly submitted bots against widely varying opponents before narrowing down the approximate rating of reasonable opponents.

You can see two stats related to your rating under the Analysis tab on your profile. Your μ is an estimation of your skill; it changes quickly when you submit a new bot version. Many players rely on this figure as an early indication of their bots performance.

Your rating becomes more accurate as your bot plays more games. The level of uncertainty the TrueSkill algorithm has is expressed by the σ, and you can expect the σ to decrease as your bot plays more games. Your final score rating is μ - 3 * σ, which determines your rating.

Submitting a bot resets your σ to 8.333, which will decrease your rating significantly. Bots eventually settle around 0.25 σ with extended play. You’ll see your σ decrease as your bot plays games.

Further resources:
The TrueSkill website has more details about how the TrueSkill algorithm works.
Community member @Janzert wrote a terrific blog post comparing various rating algorithms using Halite I finals data.

What does mu measure?
What does 1 rating point represent?