YABRL: SoS 4 not stronger than SoS 3

Archive of the old Parsimony forum. Some messages couldn't be restored. Limitations: Search for authors does not work, Parsimony specific formats do not work, threaded view does not work properly. Posting is disabled.

YABRL: SoS 4 not stronger than SoS 3

Postby Robert Allgeuer » 23 Jan 2004, 20:08

Geschrieben von: / Posted by: Robert Allgeuer at 23 January 2004 20:08:55:

Even though SoS 4 won the direct match against SoS 3 (60% in 20 games), overall it turned out not to be stronger than SoS 3, at least in Blitz (300+2). In fact after 639 unique games SoS 4 scored 5 ELO points less than its predecessor.
For details on platform, conditions, tools etc. please refer to the link below.
Next engine will be SmarThink 0.17a.
Robert


    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
 01 Ruffian v2.0.0            : 2682   18  34   680    75.1 %   2491   22.8 %
 02 Ruffian v1.0.1            : 2650   17  27   836    71.3 %   2492   25.6 %
 03 Ktulu v4.2                : 2590   22  28   654    64.1 %   2489   22.2 %
 04 Crafty v17.14DC           : 2583   20  22   800    63.3 %   2488   31.1 %
 05 Crafty v19.06DCntb        : 2581   21  23   721    61.7 %   2498   29.7 %
 06 Aristarch v4.21           : 2579   19  22   877    61.6 %   2497   23.1 %
 07 Crafty-MPC v18.15DC       : 2561   21  22   784    59.6 %   2494   26.3 %
 08 Delfi v4.3                : 2559   21  22   800    59.1 %   2495   23.9 %
 09 Delfi v4.2                : 2555   25  25   580    58.1 %   2499   27.2 %
 10 SmarThink v0.16b++        : 2554   21  22   816    58.3 %   2496   23.7 %
 11 Little Goliath 2000 v3.9  : 2551   21  20   880    57.7 %   2497   25.8 %
 12 Crafty v18.15DC           : 2551   22  22   741    59.0 %   2487   29.0 %
 13 Pepito v1.59 profile      : 2549   21  20   880    57.3 %   2497   25.8 %
 14 Yace Paderborn            : 2546   21  20   880    57.0 %   2497   25.6 %
 15 SoS 3                     : 2546   21  21   879    56.9 %   2498   21.8 %
 16 Aristarch v4.4            : 2542   36  34   319    54.1 %   2513   20.4 %
 17 SoS 4                     : 2541   24  25   639    57.2 %   2490   22.1 %
 18 Yace v0.99.56             : 2533   34  30   360    54.7 %   2500   25.6 %
 19 Green Light Chess v3.00   : 2531   22  19   880    54.8 %   2498   25.1 %
 20 Little Goliath 2000 v3.5  : 2531   31  25   440    53.6 %   2506   30.9 %
 21 Amyan v1.59               : 2509   24  20   772    52.0 %   2495   25.5 %
 22 Pharaon v2.62             : 2500   23  18   879    50.3 %   2498   23.9 %
 23 Crafty v19.01DC           : 2493   24  19   815    50.4 %   2490   25.5 %
 24 LambChop v10.99           : 2491   19  23   878    48.9 %   2499   23.2 %
 25 Gromit v3.8.2             : 2489   19  23   857    48.7 %   2498   23.0 %
 26 Ktulu v3.9                : 2486   19  24   779    48.6 %   2496   26.1 %
 27 SlowChess v2.89b          : 2480   20  24   759    47.6 %   2497   24.8 %
 28 KnightDreamer v3.2        : 2480   20  24   780    47.8 %   2495   25.3 %
 29 Anmon v5.22               : 2476   19  23   819    46.9 %   2498   25.9 %
 30 Amy v0.8.3                : 2474   21  22   871    46.6 %   2498   18.7 %
 31 Comet B44-2               : 2474   19  23   800    46.6 %   2497   27.5 %
 32 SoS v11-99                : 2471   33  34   359    46.0 %   2499   17.3 %
 33 Tao v5.4                  : 2470   20  22   879    45.8 %   2499   19.9 %
 34 Dragon v4.4.3             : 2456   21  23   747    44.5 %   2494   26.1 %
 35 Comet B62-3               : 2454   21 22   780    44.0 %   2496   26.2 %
 36 PostModernist v1.007      : 2437   22  21   800    41.3 %   2498   25.4 %
 37 Francesca M.0.0.9         : 2434   21  20   879    40.6 %   2500   25.6 %
 38 Comet B60                 : 2428   22  21   780    41.2 %   2491   25.6 %
 39 Leila v0.53h              : 2417   24  20   799    38.4 %   2499   21.3 %
 40 Tcb v0045                 : 2414   23  20   799    38.1 %   2499   24.7 %
 41 Resp v0.19                : 2397   25  20   780    35.9 %   2498   22.3 %
 42 Nejmet v3.07              : 2378   27  19   776    33.4 %   2498   22.2 %
 43 SlowChess v2.78           : 2372   27  19   790    33.5 %   2492   19.6 %
 44 Exchess v4.03             : 2324   31  17   779    26.8 %   2499   22.5 %
 45 Beowulf v2.2              : 2302   34  16   860    24.0 %   2502   18.0 %

Games        :  17131 (finished)
White Wins   :   7022 (41.0 %)
Black Wins   :   5944 (34.7 %)
Draws        :   4165 (24.3 %)
Unfinished   :      0
White Perf.  : 53.1 %
Black Perf.  : 46.9 %

(17) SoS 4                     : 639 (+295,=141,-203), 57.2 %
SoS 3                         :  20 (+  9,=  6,-  5), 60.0 %
Ruffian v2.0.0                :  20 (+  1,=  7,- 12), 22.5 %
Ktulu v4.2                    :  20 (+  6,=  7,-  7), 47.5 %
Crafty v19.06DCntb            :  19 (+  9,=  6,-  4), 63.2 %
Aristarch v4.21               :  20 (+  9,=  3,-  8), 52.5 %
Crafty-MPC v18.15DC           :  20 (+  8,=  4,-  8), 50.0 %
Delfi v4.3                    :  20 (+  5,=  5,- 10), 37.5 %
SmarThink v0.16b++            :  20 (+  6,=  4,- 10), 40.0 %
Little Goliath 2000 v3.9      :  20 (+  9,=  4,-  7), 55.0 %
Yace Paderborn                :  20 (+ 12,=  3,-  5), 67.5 %
Pepito v1.59 profile          :  20 (+  6,=  6,-  8), 45.0 %
Green Light Chess v3.00       :  20 (+  7,=  3,- 10), 42.5 %
Amyan v1.59                   :  20 (+  5,=  7,-  8), 42.5 %
Pharaon v2.62                 :  20 (+ 12,=  3,-  5), 67.5 %
Gromit v3.8.2                 :  20 (+ 12,=  2,-  6), 65.0 %
LambChop v10.99               :  20 (+  6,=  6,-  8), 45.0 %
SlowChess v2.89b              :  20 (+ 11,=  4,-  5), 65.0 %
KnightDreamer v3.2            :  20 (+ 10,=  4,-  6), 60.0 %
Amy v0.8.3                    :  20 (+ 13,=  4,-  3), 75.0 %
Anmon v5.22                   :  20 (+ 10,=  4,-  6), 60.0 %
Comet B44-2                   :  20 (+  8,=  6,-  6), 55.0 %
Tao v5.4                      :  20 (+ 11,=  5,-  4), 67.5 %
Comet B62-3                   :  20 (+ 14,=  5,-  1), 82.5 %
Dragon v4.4.3                 :  20 (+  4,=  3,- 13), 27.5 %
PostModernist v1.007          :  20 (+  9,=  5,-  6), 57.5 %
Francesca M.0.0.9             :  20 (+  9,=  4,-  7), 55.0 %
Tcb v0045                     :  20 (+ 14,=  3,-  3), 77.5 %
Leila v0.53h                  :  20 (+  7,=  5,-  8), 47.5 %
Resp v0.19                    :  20 (+ 12,=  2,-  6), 65.0 %
Nejmet v3.07                  :  20 (+ 11,=  5,-  4), 67.5 %
Exchess v4.03                 :  20 (+ 17,=  3,-  0), 92.5 %
Beowulf v2.2                  :  20 (+ 13,=  3,-  4), 72.5 %





YABRL (Yet Another Blitz Rating List)
Robert Allgeuer
 

Re: YABRL: SoS 4 not stronger than SoS 3

Postby Dann Corbit » 23 Jan 2004, 21:29

Geschrieben von: / Posted by: Dann Corbit at 23 January 2004 21:29:54:
Als Antwort auf: / In reply to: YABRL: SoS 4 not stronger than SoS 3 geschrieben von: / posted by: Robert Allgeuer at 23 January 2004 20:08:55:
Even though SoS 4 won the direct match against SoS 3 (60% in 20 games), overall it turned out not to be stronger than SoS 3, at least in Blitz (300+2). In fact after 639 unique games SoS 4 scored 5 ELO points less than its predecessor.
For details on platform, conditions, tools etc. please refer to the link below.
Next engine will be SmarThink 0.17a.
Robert


>    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
> 01 Ruffian v2.0.0            : 2682   18  34   680    75.1 %   2491   22.8 %
> 02 Ruffian v1.0.1            : 2650   17  27   836    71.3 %   2492   25.6 %
> 03 Ktulu v4.2                : 2590   22  28   654    64.1 %   2489   22.2 %
> 04 Crafty v17.14DC           : 2583   20  22   800    63.3 %   2488   31.1 %
> 05 Crafty v19.06DCntb        : 2581   21  23   721    61.7 %   2498   29.7 %
> 06 Aristarch v4.21           : 2579   19  22   877    61.6 %   2497   23.1 %
> 07 Crafty-MPC v18.15DC       : 2561   21  22   784    59.6 %   2494   26.3 %
> 08 Delfi v4.3                : 2559   21  22   800    59.1 %   2495   23.9 %
> 09 Delfi v4.2                : 2555   25  25   580    58.1 %   2499   27.2 %
> 10 SmarThink v0.16b++        : 2554   21  22   816    58.3 %   2496   23.7 %
> 11 Little Goliath 2000 v3.9  : 2551   21  20   880    57.7 %   2497   25.8 %
> 12 Crafty v18.15DC           : 2551   22  22   741    59.0 %   2487   29.0 %
> 13 Pepito v1.59 profile      : 2549   21  20   880    57.3 %   2497   25.8 %
> 14 Yace Paderborn            : 2546   21  20   880    57.0 %   2497   25.6 %
> 15 SoS 3                     : 2546   21  21   879    56.9 %   2498   21.8 %
> 16 Aristarch v4.4            : 2542   36  34   319    54.1 %   2513   20.4 %
> 17 SoS 4                     : 2541   24  25   639    57.2 %   2490   22.1 %
[snip]
>
Considering the standard deviations, the only thing you can say about the relative strengths of the two versions of SOS is that you don't know if one is stronger than the other.
The conclusion of 5 Elo weaker is not a significant thing to say. There is not any indication, given the error bands, that this is the case.
It may be 5 Elo weaker. Or 35 Elo stronger. The data does not justify a clear decision as to strength.
It would be prudent to say that SOS 4 is probably not significantly stronger than SOS 3 at this time control and under the conditions of your experiment.



my ftp site {remove http:// unless you like error messages}
Dann Corbit
 

Re: YABRL: SoS 4 not stronger than SoS 3

Postby Robert Allgeuer » 23 Jan 2004, 21:45

Geschrieben von: / Posted by: Robert Allgeuer at 23 January 2004 21:45:56:
Als Antwort auf: / In reply to: Re: YABRL: SoS 4 not stronger than SoS 3 geschrieben von: / posted by: Dann Corbit at 23 January 2004 21:29:54:
Considering the standard deviations, the only thing you can say about the relative strengths of the two versions of SOS is that you don't know if one is stronger than the other.
The conclusion of 5 Elo weaker is not a significant thing to say. There is not any indication, given the error bands, that this is the case.
It may be 5 Elo weaker. Or 35 Elo stronger. The data does not justify a clear decision as to strength.
It would be prudent to say that SOS 4 is probably not significantly stronger than SOS 3 at this time control and under the conditions of your experiment.

I know, what I wanted to say is not that SoS 4 _is_ 5 elo points weaker, I said it _scored_ 5 elo points less.
Anyway, what I meant to say was that these two versions are roughly equally strong, SoS 4 is with high probability not much stronger than SoS 3, but maybe also weaker..
Robert
Robert Allgeuer
 

Re: YABRL: SoS 4 not stronger than SoS 3

Postby CL iebert » 23 Jan 2004, 22:19

Geschrieben von: / Posted by: CL iebert at 23 January 2004 22:19:18:
Als Antwort auf: / In reply to: Re: YABRL: SoS 4 not stronger than SoS 3 geschrieben von: / posted by: Robert Allgeuer at 23 January 2004 21:45:56:
Considering the standard deviations, the only thing you can say about the relative strengths of the two versions of SOS is that you don't know if one is stronger than the other.
The conclusion of 5 Elo weaker is not a significant thing to say. There is not any indication, given the error bands, that this is the case.
It may be 5 Elo weaker. Or 35 Elo stronger. The data does not justify a clear decision as to strength.
It would be prudent to say that SOS 4 is probably not significantly stronger than SOS 3 at this time control and under the conditions of your experiment.

I know, what I wanted to say is not that SoS 4 _is_ 5 elo points weaker, I said it _scored_ 5 elo points less.
Anyway, what I meant to say was that these two versions are roughly equally strong, SoS 4 is with high probability not much stronger than SoS 3, but maybe also weaker..
Robert

I would totally agree - concret figures out of hundreds of games coming soon...
Christian


BfF-List (Update soon)
CL iebert
 


Return to Archive (Old Parsimony Forum)

Who is online

Users browsing this forum: No registered users and 24 guests