Als Antwort auf: / In reply to: Re: YABRL: Aristarch 4.37 scores slightly less than Aristarch 4.21 geschrieben von: / posted by: Kurt Utzinger at 05 February 2004 07:25:49:
I am much more interested in knowing difference
of playing strength at 40'/40 or 90m+30s or
120'/40 than at 5m+2s and would like to see
such comparisons.
Kurt
Me too!
The underlying question is: How can we obtain good estimates of the playing strength of engines with _limited_ computing resources? It is agreed that the ultimate way to test would be to have a large number of games at longer time controls, but for doing this properly in an acceptable timeframe, this would require five to ten identical computers, which not all of us can afford ....
With these constraints in mind we have 3 choices:
1) Selective matches between engines (List vs. Ruffian, then Aristarch vs. Shredder etc.) at long time controls. This is certainly interesting, but gives only selective information, not an overall picture.
2) A rating list with relatively few games at long time controls, which results in a list with wide error margins.
3) A rating list at shorter time controls, but higher number of games, which results in a list with narrow error margins, although of course under Blitz conditions.
All approaches are ok and interesting, I went for number 3. I did number 2 a while ago (rating list not posted), but I simply realised that still having error margins of 60 to 70 after half a year is not what I want and does not help me too much in measuring the real playing strength.
I do believe that there is a correlation between Blitz performance and performance at longer time controls, in fact my rating list looks quite similar to other lists at longer time controls (with some notable exceptions), which supports this assumption. It is also no coincidence that I test with an increment of 2 seconds, because this eliminates some of the typical Blitz factors: big influence of time management, high number of wins on time and the instant moves at the end of a long game.
There are of course the exceptions of the Blitz experts (Delfi, Pepito, Yace, Amyan, Knightdreamer etc.) which go down in rating lists with longer time controls, but such tendencies are either known already or can be spotted with a few selective additional matches at longer time controls, leading overall to a pretty good picture where each engine fits.
E.g. for El Chinito 3.25 which I have currently running: At the moment probably not too many of us know where El Chinito 3.25 really fits, after the Blitz results we will have some idea. A few additional matches at longer time controls against some of the more balanced engines, such as Ruffian or Aristarch, and comparing El Chinito's performance with its performance against the same opponents in Blitz would finally give a pretty good picture of the characteristics of El Chinito.
Robert