When trying to create a Footy Tipping model, or Bot, it is important to have a baseline - something to compare against that sets a goal you want to improve on. These are simple algorithms that do better than chance but are not particularly good.
With AFL tipping there are a couple of good baseline models that I like to use:
1. Pick the Home Team
Historically the home team in AFL matches has a slight advantage, although the COVID-19 impacted 2020 season has made defining 'home team' a little more difficult. For the purposes of this I have simply used the first named team as the home team - assuming an advantage from either crowd, ground knowledge, lack of travel, or a combination of those.
Over the last ~2000 games, home teams have won ~57% of the time (counting draws as wins). This means if you tip home teams every week you will, on average, get 5.1 tips correct.
So far in season 2020, picking the first named team would net you 59% correct tipping - a good baseline to start with.
Note that on any given week, the basline bot may do better than you, but across a season you should be able to do better.
Rounds 6 & 7 in season 2020 show this well - in Round 6 every single home team won. 9/9 - perfect, but the very next week only one home team won (Richmond), giving this model just 1/9 and a cumulative 10/18 for the two weeks.
2. Elo (not ELO)
I have a long history with the Elo algorithm, since I was involved in the Australian Scrabble tournament scene back in the 80's and 90's thanks to my brilliant Maths teacher who was an active competitor and official. He introduced me to the Elo algorithm as it was used to rank the players, and I've loved it ever since.
Elo is a mathematical model and is used all over the world - it can be used to predict the strength of two opponents and the probability of one winning over the other. You can read up on the model in lots of places, the Wikipedia page is a great start.
For the purposes of this baseline bot I'm talking about a "Pure Elo" implementation. Giving 1.0 for a win, 0.5 for a draw, and 0.0 for a loss. No home ground advantage.
Over the last ~1500 games (sorry, my data sources are not all the same size!) a pure Elo baseline bot picks the winning team an impressive 66.2% of the time. This is more or less 6/9 every week,
So far in season 2020, the Pure Elo baseline model that I use has picked the winner 64.6% of the time - still a good result.
Here's a graph that shows the relationship between the difference in the Elo scores of the home and away sides, and the average winning margin (-ve means away team win) based on the last 1500 games. There is a clear relationship here, with the team that has the higher Elo score averaging a higher margin in their favor. Note that there are still lots of games on either side of the margin axis, so this doesn't guarantee picking a winner, but it's not bad.
The astute amongst you may have spotted that the graph doesnt cross at the 0 point on the X axis, which implies even with a small -ve value (Away team has higher Elo), the home team still averages a slightly higher score. This is effectively showing a home ground advantage, or at least that when the home team wins they tend to win by more than the away teams win by.
Season 2020 so far - ZaphBot vs Baseline Bots.
Here's ZaphBot vs the two baseline bots this season (at the end of Round 15). ZaphBot is having a tough season, and would be sitting in 11th place on the Squiggle leaderboard right now. I think we can do better than that.
Tips Bits MAE Correct | Round by Round
ZaphBot 85 13.10 24.5 66.9% | 844458457666864
Autobot-PureELO 82 4.25 47.4 64.6% | 762466548754864
Autobot-HomeTeam 76 2.86 25.4 59.8% | 664659134467645
Other Baseline Models
Glicko
There are some other ratings schemes that can be used as baselines. The one I'm most interested in is the Microsoft TrueSkill ranking system which can rank players within team matches. I used to make video games back in the 90's and 2000's and have been intrigued by this for a long time and it's application to team sports. TrueSkill is Microsoft property, but there is a public domain implementation of the algorithm called Glicko - it's something I've been thinking of implementing but keep delaying because the way I would want to do it requires getting all the player information and match lists.
There is a Glicko based AFL model that I'll be keeping a close eye on: @AFLGlickoRatings
Punters
Probably the best baseline out there is the gambling community - it's also the one that if you can consistently beat, there is the potential to make money gambling (if you are in to that sort of thing).
Most seasons, the Punters will be at the top or very close to the top of the tipping ladder. So far in 2020 the Punters are leading the Squiggle Leaderboard. Unfortunately it's not a great model to use because you cannot repeat it or generate tips from data - you simply have to wait for the odds just before the match starts and use those. While I do like to compare against this, I don't actually model it myself.
I hope some of that has been interesting for people (and robots) - I'll be back with more in the next week.