Moderator: Andres Valverde
1) Use number of wins, loss, and draws
W = number of wins, L = number of lost, D = number of draws
n = number of games (W + L + D)
m = mean value
2) Apply the following formulas to compute s
( SQRT: square root of. )
x = W*(1-m)*(1-m) + D*(0.5-m)*(0.5-m) + L*(0-m)*(0-m)
s = SQRT( x/(n-1) )
3) Compute error margin A (use 1.96 for 95% confidence)
A = 1.96 * s / SQRT(n)
4) State with 95% confidence:
The 'real' result should be somewhere in between m-A to m+A
5) Lookup the ELO figures with the win% from m-A and m+A to get the lower and higher values in the error margin.
Dieter B?r?ner wrote:I also just saw your question in the CSS forum. Did you receive any relevant article?
Dieter B?r?ner wrote:Ok, I think I know it now. In a tournament, calculate the performance p_i of each player. Give every player an initial rating (the same for everybody). Calculate
E_i = E_average_of_opponents + 400 * log10(p_i/1-p_i);
Dieter B?r?ner wrote:s+/- = \sqrt{1/N+/- \sum_{i=1}^{N+/-}{x_i-x_av}^2}
\sigma+/- = s+/- / \sqrt{N-1}
Dieter B?r?ner wrote:For those, who have time and interest: http://www.stat.psu.edu/~dhunter/papers/bt.pdf
Implement this model (perhaps preferably the one with ties and homefileld=white advantage), and compare wit Elostat. Perhaps statisticans here already have a library for it.
Yes I think that a Bayesian approach is more sound. Using mixed Bayesian/likelihood approaches is the more modern way to go compared to the old frequency Elo! I'm not sure however that it in practice will make life better. By not using the same approach for computer chess as human chess it will be even harder to compare the different pools. It is already hard...R?mi Coulom wrote:Thanks Dieter,
Here are some preliminary result with Bayesian Elo:
http://remi.coulom.free.fr/Bayesian-Elo/
What do you think ?
R?mi
Peter Fendrich wrote:After a quick look: Why shouldn't we assume that the strength is constant for chess programs?
Peter Fendrich wrote:Have you looked at how sensitive the result is on alpha=0.5? As I understand it, the impact of alpha varying between 0.4 and 0.6 can be ignored but I may have missed something.
Peter Fendrich wrote:The maximum-likelihood mentioned or a similar estimation seems to be doable. Is this maximum-likelihood standard these days?
Ok, I became a little bit suspicious...R?mi Coulom wrote:After thinking a little more about it, I do not believe that the maximum likelihood is the best thing to compute. As the figure shows, there may be more than one maximum. Also, likelihood distributions are often assymetric, so the expected rating is very different from the maximum-likelihood rating.
Great!Here is how I would do it. It is closer to the ELOStat iterative approach:
- start with all elos to 500
- iterate a procedure that replaces the elo of each player by the expected Elo obtained by Bayesian inference, assuming the Elos of opponents are their true Elos.
- Finally, compute confidence interval from the likelihood of every player, assuming their opponents have true Elos.
I will try this and make the program available.
R?mi
R?mi Coulom wrote:I will try this and make the program available.
Ulysses Omycron wrote:Hello, I've been running Engines Matches in my computer for a while and always wanted to calculate the programs' rating, but I didn't like ELOStat because if you ran a tournament of 4 programs ("A", "B", "C" and "D") and another with other 4 programs ("E", "F", "G", and "H") you may get a PGN with 32 games, ELOStat is ok, but once you enter the PGN a game of program "I" beating program "C" and losing to program "F", the ratings are messed up, you might need to use separated PGNs and the ratings are lost; the solution of course is: Instead of "All the programs start with the same rating", "Each program can be assigned a rating that may be different than other program's rating before the new rating calculation starts" seems like a fancy feature, but I'm affraid it's beyond humanity ;).
I downloaded Bayeselo, but "No command line parameter found". To solve this I may either: Run it from the MSDOS Prompt and enter the commands in real time; make a bayeselo.bat file containing the commands so I just double click it and the work is done; or create a shortcut and add the commands in the Destiny box.
I still have to figure out what is my best option and how to do it, but being able to see an online "How to use it" page would be nice :)
Best regards.
R?mi Coulom wrote:Dieter, if you understand the paper well, and can provide the formula, this would be very welcome.
In fact, the paper is not so hard to understand. I have understood the principle of the formulas he gives, so I should be able to figure out the formula to combine home-field advantage and ties by myself. I will probably do that right after the holidays.Dieter B?r?ner wrote:Sorry, no I don't understand enough (yet?). I had the impression, that I saw some mathematica or matlab code supporting Hunter's paper. Perhaps you can find it.
Cheers,
Dieter
R?mi Coulom wrote:That feature would be very easy to add. I will do it.
R?mi Coulom wrote:Also, I wonder what you mean by "the ratings are messed up". The ratings of E, F, G, and H will go up a lot, and the ratings of A, B, C and D will go down a lot. Is this what you'd like to avoid ?
R?mi Coulom wrote:I believe Leo Dijksman has the same problem with his divisions, and this is the reason why he does not use ELOStat. Assigning values to some players should fix the problem, but I will try to think about better solutions.
Return to Programming and Technical Discussions
Users browsing this forum: No registered users and 5 guests