Introducing the CoCo's new rating system

January 20, 2020 Evans Clinchy

It's now been three weeks since we launched the CoCo, and we've been doing a lot of work during that time to flesh out our ideas and prepare for our first tournament in February. A lot of that work has been interpersonal in nature - we've been building relationships with players, tournament directors, and other people who can help behind the scenes as our organization grows. Some of the other work has been ... well, rather technical and boring. Some of it even involves math. Gross, I know.

But in the interest of transparency, we're trying to share every single detail we can about how we're building our organization. Even the mathy ones. So today, let's talk about calculating ratings.

(If you're already bored, I can't blame you. Feel free to close this browser tab. We'll be back with a more interesting post next week.)

For those of you still here: When it comes to ratings, we're going to try something a little different than what Scrabble players in the U.S. and Canada today are used to. For a long time, the standard in North America has been the ELO rating system - a relatively simple formula that pegs players' skill levels based on wins, losses, and strength of schedule. ELO ratings work decently enough, and many people like them just fine. But as we start a new organization, we've got a chance to do something new and innovative. Bear with us.

We're going to try something called a Glicko-2 system. It's named after Mark Glickman, a professor at Boston University who developed it. (If you want to get real nerdy, you can check out his paper on the math involved.) Credit is also due to Taral Seierstad, who worked on implementing a Glicko-2 system for Scrabble in Norway, and whose work influenced ours in a big way. Thanks, Taral!

Basically what the system does is assess each player based on two factors - the strength of their opponent in each game, and how they perform against their expected spread in that game. So if your rating says you're a significant favorite against a given opponent, you've got to win by a big margin to keep your rating. Conversely, if you're an underdog, you can gain rating simply by keeping the spread closer than expected. The idea here is that by using spread in addition to just wins, we'll have much more precise data on how well each player is performing. This will yield more accurate ratings - and fast.

To walk you through it, I've got an example tournament. Basically I went into the CoCo's database of players, picked eight people at random and seeded them according to their WESPA ratings. Then I "directed" them in a fake one-day tournament. I scored each game by mashing random numbers on my keyboard. Without further ado, the CoCo Sample Open:

The chart above is the data that Marc Levesque's Director! program spits out post-tournament. It probably looks like gibberish to you, so I'll break it down in plain English. Ben and Puneet started out the tournament strong, each at 2-0, while Becky struggled in her first two games, with one loss and one tie. The tables turned after that, with Ben and Puneet falling to the middle of the pack and Becky rallying to go 5-1 in her last six games. Mark ended up in second, with Ben/Puneet close behind. If this were a WESPA tournament, the final standings and post-tourney ratings would look like this:

This looks fine! Becky won the tournament and gained some rating; Mark did far better than his expected wins, so he gained too. And so on. There's nothing wrong with these numbers. But under our system, they'll look a little different - and we believe they're better. Here are the exact same results, except this time, rated as a CoCo event:

The ratings are quite different! Why, you ask? In short, it's because we're using the players' spreads to get a more precise read on how well each person played. Here's a quick breakdown of each player's rating change.

Becky performs better in the CoCo system because her spread of +364 was much higher than you'd expect from a middling seed.
Mark had a good tournament and gained rating under both systems, but he performs better in WESPA (strong W/L record) than in CoCo (not as strong spread).
Puneet's record may only have been 4-4, but he had the second-best spread in the field, so he gets a nice little rating bump in CoCo.
Ben, as one of the top seeds, was expected to do better in both wins and spread, so he drops a bit under both systems.
Randi was seeded sixth out of eight but ended up with a positive spread! That's a very good result, so she gains a nice chunk of rating.
Jesse was the top seed and should have had a very strong spread, according to his rating; he drops quite a bit.
Jeremy is a tricky one because he came in unrated. It will be tough to get a precise read on his skill level at first, with such little information. The 1467 figure is our system’s estimate, based on his spread in this one tourney.
Rasheed came in with one of the strongest ratings in the field and ended up with the lowest spread by far; the CoCo system can be harsh after a performance like that. Sorry, Rasheed! I'll mash different random numbers on my keyboard next time.

In conclusion: What we're doing here is starting with people's WESPA ratings, evaluating their CoCo play moving forward, and moving aggressively to rate everyone as accurately as possible, as fast as possible. Brace yourself! Your rating could be in for a wild ride at first. It may go up and down a little more dramatically than you're used to, at least in the beginning.

Over time, you'll want to watch that "New Dev" column on the far right. The lower that number is, the more confident we are that we have your rating pegged accurately. The more you play, the more data we'll have on your skill level, and your rating will begin to deviate a little less. After a while, things will return to normal.

Overall, though, my advice is not to worry about this stuff too much. After all, it's just a rating, and there's more to Scrabble than the number by your name. I hope you'll all show up, play our tournaments, and have fun! Don't sweat the math too much - it'll work itself out in the long run.