Kirill Kryukov wrote:I got curious about one thing. As I understand, BayesELO is tuned for WBEC currently. As far as I know, WBEC is a very sparse tournament. It has big roundrobins for leagues, and leagues are connected by smaller promotion roundrobins. Do you think that tuning BayesELO to such sparse data will also give good ratings for more concentrated tournament, like a single, but big, roundrobin?
The sparsity of the tournament is not really what determines the best value of the prior. The prior indicates how close in strength we expect players to be. In a tournament where very weak players may play against very strong players, it might be better to use a smaller prior. In tournaments where most of the games are between players that are close in strength, a larger prior might be better.
Also, changing the prior should not change the order of players much. The effect of increasing the prior is mainly to reduce the scale of rating differences, as you have already noticed.
Kirill Kryukov wrote:
Another thing.. For a single roundrobin BayesELO and ELOstat are not too much different. But for a sparse data like WBEC they give very different ratings. I tried it for WBEC games, and the ratings are sometimes very different.. (One table is here
, just hit esc after the rating table is opened, to not load the huge pairwise tables below that). So I wonder, how they can be so different? I can understand a few percent difference, but 158 vs -342 (Kiwi 0.5a), or 283 vs -132 (Delphil 1.5b) is quite large...
This is a very interesting example. The big rating differences that you noticed revolve around "Promo D" of WBEC 10
. Let us take the striking example of Natwarlal 0.12 and NullMover 0.25. Natwarlal 0.12 finished in the top of division 4, and won the promotion tournament. NullMover 0.25 was in the bottom of division 3, and performed poorly in the promotion tournament. Here are the ratings that we get:
- Natwarlal 0.12: 210 (bayeselo) and -267(elostat)
- NullMover 0.25: 74(bayeselo) and -148(elostat)
I have a strong feeling that the ratings produced by bayeselo are much better than those produced by elostat in this situation. A fundamental problem of elostat is that it makes the assumption that when a program gets a winning percentage against a variety of opponents, it is equivalent to the same winning percentage against one single opponent, whose rating is equal to the average of opponents. This assumption is very wrong, and fails badly in this particular situation.