YABRL: Thinker 4.5b in the "pack behind Ruffian"

Archive of the old Parsimony forum. Some messages couldn't be restored. Limitations: Search for authors does not work, Parsimony specific formats do not work, threaded view does not work properly. Posting is disabled.

YABRL: Thinker 4.5b in the "pack behind Ruffian"

Postby Robert Allgeuer » 26 Feb 2004, 11:07

Geschrieben von: / Posted by: Robert Allgeuer at 26 February 2004 11:07:04:

My tests (716 unique Blitz games) suggest that Thinker is now one of the strongest free engines available, at least in Blitz. In fact in the YABRL rating list Thinker has established itself in the group of the strongest free engines behind Ruffian 1.0.x (i.e. Crafty, Ktulu, Thinker, Smarthink and Aristarch) that are all very close too each other.
Observations:
- The search depth that Thinker outputs in the PV is never very deep, in fact usually one to two plies less than with most other engines. Thinker must extend a lot ...
- The Thinker executable is unbelievably small - just 77K - yet so strong.
- There was a bug in claiming draws (see post below), now fixed in Thinker 4.5c.
- At least in my configuration Thinker definitely does not display its score in pawn units, its units are more something like three pawns. E.g. when Thinker 4.5b is a rook up, it may display a score of 1.8 or similar.
- When letting engines play against so many different opponents sometimes there are surprises, e.g. one can spot specific "weaknesses". Thinker's "weakness" seems to be Francesca, which achieved a score (50%) that not many other engines have achieved.
To Lance:
What are the exact changes between Thinker 4.5b and 4.5c? Is it only the two bug fixes (setboard and draw claim), because in this case I would replace Thinker 4.5b by 4.5c without replaying all the games.
Secondly, would you think that Thinker is a "Blitz expert" or should it be as strong also at longer time controls?
Time control is 300+2; for conditions, platform and tools please refer to the link below.
Next will be a Ruffian qualification tournament. The aim is to determine whether Ruffian 2.0.2 or 2.1.0 are stronger under the given conditions. The stronger of the two will then be the next engine to be added to YABRL.
Robert



    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
 01 Ruffian v2.0.0            : 2671   17  29   800    72.6 %   2502   24.6 %
 02 Ruffian v1.0.1            : 2650   17  27   836    71.3 %   2492   25.6 %
 03 Crafty v17.14DC           : 2584   19  19   960    61.4 %   2504   32.2 %
 04 Ktulu v4.2                : 2584   21  24   775    61.7 %   2501   23.6 %
 05 Thinker v4.5b             : 2582   21  22   716    61.5 %   2501   32.5 %
 06 SmarThink v0.17a          : 2581   21  23   759    61.3 %   2501   25.2 %
 07 Crafty v19.06DCntb        : 2576   21  20   821    59.9 %   2506   30.5 %
 08 Aristarch v4.21           : 2575   19  20   977    60.2 %   2504   24.6 %
 09 Aristarch v4.37           : 2566   22  20   739    59.4 %   2500   36.8 %
 10 Crafty-MPC v18.15DC       : 2562   20  20   884    58.5 %   2502   27.7 %
 11 Delfi v4.3                : 2561   20  21   900    58.3 %   2503   24.2 %
 12 El Chinito v3.25          : 2559   22  22   740    58.3 %   2500   27.7 %
 13 Delfi v4.2                : 2556   25  25   580    58.1 %   2499   27.2 %
 14 SmarThink v0.16b++        : 2554   21  21   836    58.0 %   2498   24.3 %
 15 Little Goliath 2000 v3.9  : 2552   20  19   980    56.8 %   2504   26.3 %
 16 Crafty v18.15DC           : 2552   22  22   741    59.0 %   2488   29.0 %
 17 SoS 3                     : 2547   20  19   979    56.1 %   2504   22.7 %
 18 Pepito v1.59 profile      : 2547   20  19   980    56.1 %   2504   25.4 %
 19 Yace Paderborn            : 2546   20  19   980    55.9 %   2504   26.3 %
 20 Aristarch v4.4            : 2542   36  34   319    54.1 %   2514   20.4 %
 21 SoS 4                     : 2540   23  21   759    55.5 %   2502   24.5 %
 22 Yace v0.99.56             : 2533   34  30   360    54.7 %   2500   25.6 %
 23 Little Goliath 2000 v3.5  : 2532   31  25   440    53.6 %   2507   30.9 %
 24 Green Light Chess v3.00   : 2531   21  18   980    53.8 %   2505   25.2 %
 25 Anmon v5.30               : 2513   25  20   740    51.6 %   2502   27.0 %
 26 Amyan v1.59               : 2510   23  18   872    51.0 %   2503   26.3 %
 27 Pharaon v2.62             : 2502   18  22   979    49.6 %   2505   24.0 %
 28 Crafty v19.01DC           : 2494   24  19   815    50.4 %   2491   25.5 %
 29 LambChop v10.99           : 2491   18  21   978    47.9 %   2505   22.8 %
 30 Ktulu v3.9                : 2486   19  24   779    48.6 %   2496   26.1 %
 31 Gromit v3.8.2             : 2486   18  21   957    47.3 %   2505   23.4 %
 32 SlowChess v2.89b          : 2482   19  22   858    46.8 %   2505   24.1 %
 33 Anmon v5.22               : 2478   19  22   899    46.4 %   2503   26.7 %
 34 KnightDreamer v3.2        : 2478   19  22   880    46.3 %   2503   25.1 %
 35 Amy v0.8.3                : 2474   20  21   969    45.6 %   2505   19.5 %
 36 Tao v5.4                  : 2473   19  20   979    45.3 %   2506   20.8 %
 37 Comet B44-2               : 2472   19  21   900    45.4 %   2505   27.2 %
 38 SoS v11-99                : 2472   33  34   359    46.0 %   2500   17.3 %
 39 Dragon v4.4.3             : 2460   20  22   846    44.0 %   2502   25.2 %
 40 Comet B62-3               : 2451   20  21   880    42.5 %   2504   25.9 %
 41 Francesca M.0.0.9         : 2438   20  19   979    40.2 %   2506   25.7 %
 42 PostModernist v1.007      : 2435   21  20   900    39.9 %   2505   25.4 %
 43 Comet B60                 : 2429   22  21   780    41.2 %   2491   25.6 %
 44 Leila v0.53h              : 2420   23  19   898    38.0 %   2506   21.4 %
 45 Tcb v0045                 : 2418   22  19   899    37.7 %   2506   25.0 %
 46 Resp v0.19                : 2400   24  18   880    35.3 %   2505   23.8 %
 47 Nejmet v3.07              : 2384   25  18   876    33.2 %   2505   22.3 %
 48 SlowChess v2.78           : 2373   27  19   790    33.5 %   2493   19.6 %
 49 Exchess v4.03             : 2325   29  16   879    26.1 %   2507   22.5 %
 50 Beowulf v2.2              : 2302   33  15   960    23.3 %   2509   18.1 %

Games        :  20686 (finished)
White Wins   :   8427 (40.7 %)
Black Wins   :   7056 (34.1 %)
Draws        :   5203 (25.2 %)
Unfinished   :      0
White Perf.  : 53.3 %
Black Perf.  : 46.7 %

(5) Thinker v4.5b             : 716 (+324,=233,-159), 61.5 %
Ruffian v2.0.0                :  20 (+  5,=  6,-  9), 40.0 %
Ktulu v4.2                    :  20 (+  7,= 10,-  3), 60.0 %
Crafty v17.14DC               :  20 (+  6,=  9,-  5), 52.5 %
SmarThink v0.17a              :  20 (+  7,= 10,-  3), 60.0 %
Aristarch v4.21               :  20 (+  9,=  7,-  4), 62.5 %
Crafty v19.06DCntb            :  20 (+  3,=  8,-  9), 35.0 %
Aristarch v4.37               :  20 (+  6,=  8,-  6), 50.0 %
Delfi v4.3                    :  20 (+  8,=  5,-  7), 52.5 %
Crafty-MPC v18.15DC           :  20 (+  2,= 14,-  4), 45.0 %
El Chinito v3.25              :  20 (+  7,= 11,-  2), 62.5 %
Little Goliath 2000 v3.9      :  20 (+  4,= 10,-  6), 45.0 %
SoS 3                         :  20 (+  7,=  7,-  6), 52.5 %
Pepito v1.59 profile          :  20 (+  6,=  5,-  9), 42.5 %
Yace Paderborn                :  20 (+  7,=  5,-  8), 47.5 %
SoS 4                         :  20 (+ 10,=  5,-  5), 62.5 %
Green Light Chess v3.00       :  20 (+  9,=  6,-  5), 60.0 %
Anmon v5.30                   :  20 (+ 13,=  4,-  3), 75.0 %
Amyan v1.59                   :  20 (+ 10,=  7,-  3), 67.5 %
Pharaon v2.62                 :  20 (+ 12,=  5,-  3), 72.5 %
LambChop v10.99               :  20 (+  9,=  5,-  6), 57.5 %
Gromit v3.8.2                 :  20 (+  8,=  9,-  3), 62.5 %
SlowChess v2.89b              :  19 (+ 11,=  3,-  5), 65.8 %
KnightDreamer v3.2            :  20 (+ 13,=  4,-  3), 75.0 %
Amy v0.8.3                    :  18 (+ 12,=  2,-  4), 72.2 %
Comet B44-2                   :  20 (+ 10,=  7,-  3), 67.5 %
Tao v5.4                      :  20 (+  7,= 10,-  3), 60.0 %
Dragon v4.4.3                 :  20 (+ 11,=  4,-  5), 65.0 %
Comet B62-3                   :  20 (+  9,=  7,-  4), 62.5 %
PostModernist v1.007          :  20 (+ 13,=  6,-  1), 80.0 %
Francesca M.0.0.9             :  20 (+  9,=  2,-  9), 50.0 %
Tcb v0045                     :  20 (+ 15,=  3,-  2), 82.5 %
Leila v0.53h                  :  19 (+ 10,=  4,-  5), 63.2 %
Resp v0.19                    :  20 (+ 12,=  7,-  1), 77.5 %
Nejmet v3.07                  :  20 (+ 11,=  6,-  3), 70.0 %
Exchess v4.03                 :  20 (+ 14,=  6,-  0), 85.0 %
Beowulf v2.2                  :  20 (+ 12,=  6,-  2), 75.0 %





YABRL (Yet Another Blitz Rating List)
Robert Allgeuer
 

Re: YABRL: Thinker 4.5b in the "pack behind Ruffian&quo

Postby Lance Perkins » 26 Feb 2004, 20:06

Geschrieben von: / Posted by: Lance Perkins at 26 February 2004 20:06:15:
Als Antwort auf: / In reply to: YABRL: Thinker 4.5b in the "pack behind Ruffian" geschrieben von: / posted by: Robert Allgeuer at 26 February 2004 11:07:04:

The only change made in 4.5c is to fix the bug in early draw claim and a bug in setboard.
The score printed on the PV seems to be lower because the material scores get scaled down to match my position score tables (they don’t seem to agree well in my tests and the only way to stabilize the scores was to scale down the material score; scaling-up the position score tables could probably yield the same result but requires me to fix more code).
I’m hoping that Thinker would also perform decently on longer time controls (like it did in CCT6).
As far as size goes, you will notice that Thinker actually grew from 70K (4.4k) to 77K (4.5a). All that 7K of new eval and new search code I hope is worth it.
My tests (716 unique Blitz games) suggest that Thinker is now one of the strongest free engines available, at least in Blitz. In fact in the YABRL rating list Thinker has established itself in the group of the strongest free engines behind Ruffian 1.0.x (i.e. Crafty, Ktulu, Thinker, Smarthink and Aristarch) that are all very close too each other.
Observations:
- The search depth that Thinker outputs in the PV is never very deep, in fact usually one to two plies less than with most other engines. Thinker must extend a lot ...
- The Thinker executable is unbelievably small - just 77K - yet so strong.
- There was a bug in claiming draws (see post below), now fixed in Thinker 4.5c.
- At least in my configuration Thinker definitely does not display its score in pawn units, its units are more something like three pawns. E.g. when Thinker 4.5b is a rook up, it may display a score of 1.8 or similar.
- When letting engines play against so many different opponents sometimes there are surprises, e.g. one can spot specific "weaknesses". Thinker's "weakness" seems to be Francesca, which achieved a score (50%) that not many other engines have achieved.
To Lance:
What are the exact changes between Thinker 4.5b and 4.5c? Is it only the two bug fixes (setboard and draw claim), because in this case I would replace Thinker 4.5b by 4.5c without replaying all the games.
Secondly, would you think that Thinker is a "Blitz expert" or should it be as strong also at longer time controls?
Time control is 300+2; for conditions, platform and tools please refer to the link below.
Next will be a Ruffian qualification tournament. The aim is to determine whether Ruffian 2.0.2 or 2.1.0 are stronger under the given conditions. The stronger of the two will then be the next engine to be added to YABRL.
Robert



>    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
> 01 Ruffian v2.0.0            : 2671   17  29   800    72.6 %   2502   24.6 %
> 02 Ruffian v1.0.1            : 2650   17  27   836    71.3 %   2492   25.6 %
> 03 Crafty v17.14DC           : 2584   19  19   960    61.4 %   2504   32.2 %
> 04 Ktulu v4.2                : 2584   21  24   775    61.7 %   2501   23.6 %
> 05 Thinker v4.5b             : 2582   21  22   716    61.5 %   2501   32.5 %
> 06 SmarThink v0.17a          : 2581   21  23   759    61.3 %   2501   25.2 %
> 07 Crafty v19.06DCntb        : 2576   21  20   821    59.9 %   2506   30.5 %
> 08 Aristarch v4.21           : 2575   19  20   977    60.2 %   2504   24.6 %
> 09 Aristarch v4.37           : 2566   22  20   739    59.4 %   2500   36.8 %
> 10 Crafty-MPC v18.15DC       : 2562   20  20   884    58.5 %   2502   27.7 %
> 11 Delfi v4.3                : 2561   20  21   900    58.3 %   2503   24.2 %
> 12 El Chinito v3.25          : 2559   22  22   740    58.3 %   2500   27.7 %
> 13 Delfi v4.2                : 2556   25  25   580    58.1 %   2499   27.2 %
> 14 SmarThink v0.16b++        : 2554   21  21   836    58.0 %   2498   24.3 %
> 15 Little Goliath 2000 v3.9  : 2552   20  19   980    56.8 %   2504   26.3 %
> 16 Crafty v18.15DC           : 2552   22  22   741    59.0 %   2488   29.0 %
> 17 SoS 3                     : 2547   20  19   979    56.1 %   2504   22.7 %
> 18 Pepito v1.59 profile      : 2547   20  19   980    56.1 %   2504   25.4 %
> 19 Yace Paderborn            : 2546   20  19   980    55.9 %   2504   26.3 %
> 20 Aristarch v4.4            : 2542   36  34   319    54.1 %   2514   20.4 %
> 21 SoS 4                     : 2540   23  21   759    55.5 %   2502   24.5 %
> 22 Yace v0.99.56             : 2533   34  30   360    54.7 %   2500   25.6 %
> 23 Little Goliath 2000 v3.5  : 2532   31  25   440    53.6 %   2507   30.9 %
> 24 Green Light Chess v3.00   : 2531   21  18   980    53.8 %   2505   25.2 %
> 25 Anmon v5.30               : 2513   25  20   740    51.6 %   2502   27.0 %
> 26 Amyan v1.59               : 2510   23  18   872    51.0 %   2503   26.3 %
> 27 Pharaon v2.62             : 2502   18  22   979    49.6 %   2505   24.0 %
> 28 Crafty v19.01DC           : 2494   24  19   815    50.4 %   2491   25.5 %
> 29 LambChop v10.99           : 2491   18  21   978    47.9 %   2505   22.8 %
> 30 Ktulu v3.9                : 2486   19  24   779    48.6 %   2496   26.1 %
> 31 Gromit v3.8.2             : 2486   18  21   957    47.3 %   2505   23.4 %
> 32 SlowChess v2.89b          : 2482   19  22   858    46.8 %   2505   24.1 %
> 33 Anmon v5.22               : 2478   19  22   899    46.4 %   2503   26.7 %
> 34 KnightDreamer v3.2        : 2478   19  22   880    46.3 %   2503   25.1 %
> 35 Amy v0.8.3                : 2474   20  21   969    45.6 %   2505   19.5 %
> 36 Tao v5.4                  : 2473   19  20   979    45.3 %   2506   20.8 %
> 37 Comet B44-2               : 2472   19  21   900    45.4 %   2505   27.2 %
> 38 SoS v11-99                : 2472   33  34   359    46.0 %   2500   17.3 %
> 39 Dragon v4.4.3             : 2460   20  22   846    44.0 %   2502   25.2 %
> 40 Comet B62-3               : 2451   20  21   880    42.5 %   2504   25.9 %
> 41 Francesca M.0.0.9         : 2438   20  19   979    40.2 %   2506   25.7 %
> 42 PostModernist v1.007      : 2435   21  20   900    39.9 %   2505   25.4 %
> 43 Comet B60                 : 2429   22  21   780    41.2 %   2491   25.6 %
> 44 Leila v0.53h              : 2420   23  19   898    38.0 %   2506   21.4 %
> 45 Tcb v0045                 : 2418   22  19   899    37.7 %   2506   25.0 %
> 46 Resp v0.19                : 2400   24  18   880    35.3 %   2505   23.8 %
> 47 Nejmet v3.07              : 2384   25  18   876    33.2 %   2505   22.3 %
> 48 SlowChess v2.78           : 2373   27  19   790    33.5 %   2493   19.6 %
> 49 Exchess v4.03             : 2325   29  16   879    26.1 %   2507   22.5 %
> 50 Beowulf v2.2              : 2302   33  15   960    23.3 %   2509   18.1 %
>Games        :  20686 (finished)
>White Wins   :   8427 (40.7 %)
>Black Wins   :   7056 (34.1 %)
>Draws        :   5203 (25.2 %)
>Unfinished   :      0
>White Perf.  : 53.3 %
>Black Perf.  : 46.7 %
>(5) Thinker v4.5b             : 716 (+324,=233,-159), 61.5 %
>Ruffian v2.0.0                :  20 (+  5,=  6,-  9), 40.0 %
>Ktulu v4.2                    :  20 (+  7,= 10,-  3), 60.0 %
>Crafty v17.14DC               :  20 (+  6,=  9,-  5), 52.5 %
>SmarThink v0.17a              :  20 (+  7,= 10,-  3), 60.0 %
>Aristarch v4.21               :  20 (+  9,=  7,-  4), 62.5 %
>Crafty v19.06DCntb            :  20 (+  3,=  8,-  9), 35.0 %
>Aristarch v4.37               :  20 (+  6,=  8,-  6), 50.0 %
>Delfi v4.3                    :  20 (+  8,=  5,-  7), 52.5 %
>Crafty-MPC v18.15DC           :  20 (+  2,= 14,-  4), 45.0 %
>El Chinito v3.25              :  20 (+  7,= 11,-  2), 62.5 %
>Little Goliath 2000 v3.9      :  20 (+  4,= 10,-  6), 45.0 %
>SoS 3                         :  20 (+  7,=  7,-  6), 52.5 %
>Pepito v1.59 profile          :  20 (+  6,=  5,-  9), 42.5 %
>Yace Paderborn                :  20 (+  7,=  5,-  8), 47.5 %
>SoS 4                         :  20 (+ 10,=  5,-  5), 62.5 %
>Green Light Chess v3.00       :  20 (+  9,=  6,-  5), 60.0 %
>Anmon v5.30                   :  20 (+ 13,=  4,-  3), 75.0 %
>Amyan v1.59                   :  20 (+ 10,=  7,-  3), 67.5 %
>Pharaon v2.62                 :  20 (+ 12,=  5,-  3), 72.5 %
>LambChop v10.99               :  20 (+  9,=  5,-  6), 57.5 %
>Gromit v3.8.2                 :  20 (+  8,=  9,-  3), 62.5 %
>SlowChess v2.89b              :  19 (+ 11,=  3,-  5), 65.8 %
>KnightDreamer v3.2            :  20 (+ 13,=  4,-  3), 75.0 %
>Amy v0.8.3                    :  18 (+ 12,=  2,-  4), 72.2 %
>Comet B44-2                   :  20 (+ 10,=  7,-  3), 67.5 %
>Tao v5.4                      :  20 (+  7,= 10,-  3), 60.0 %
>Dragon v4.4.3                 :  20 (+ 11,=  4,-  5), 65.0 %
>Comet B62-3                   :  20 (+  9,=  7,-  4), 62.5 %
>PostModernist v1.007          :  20 (+ 13,=  6,-  1), 80.0 %
>Francesca M.0.0.9             :  20 (+  9,=  2,-  9), 50.0 %
>Tcb v0045                     :  20 (+ 15,=  3,-  2), 82.5 %
>Leila v0.53h                  :  19 (+ 10,=  4,-  5), 63.2 %
>Resp v0.19                    :  20 (+ 12,=  7,-  1), 77.5 %
>Nejmet v3.07                  :  20 (+ 11,=  6,-  3), 70.0 %
>Exchess v4.03                 :  20 (+ 14,=  6,-  0), 85.0 %
>Beowulf v2.2                  :  20 (+ 12,=  6,-  2), 75.0 %
>
Lance Perkins
 

Re: YABRL: Thinker 4.5b in the "pack behind Ruffian&quo

Postby Dann Corbit » 26 Feb 2004, 20:44

Geschrieben von: / Posted by: Dann Corbit at 26 February 2004 20:44:30:
Als Antwort auf: / In reply to: Re: YABRL: Thinker 4.5b in the "pack behind Ruffian" geschrieben von: / posted by: Lance Perkins at 26 February 2004 20:06:15:
The only change made in 4.5c is to fix the bug in early draw claim and a bug in setboard.
The score printed on the PV seems to be lower because the material scores get scaled down to match my position score tables (they don’t seem to agree well in my tests and the only way to stabilize the scores was to scale down the material score; scaling-up the position score tables could probably yield the same result but requires me to fix more code).
I’m hoping that Thinker would also perform decently on longer time controls (like it did in CCT6).
As far as size goes, you will notice that Thinker actually grew from 70K (4.4k) to 77K (4.5a). All that 7K of new eval and new search code I hope is worth it.
[snip]
Pretty ridiculous that a 77K program can knock the stuffings out of programs weighing in at 1 MB.
Sort of like having a flyweight jump into the ring with the heavyweight world champion and knock the stuffings out of him.
Couldn't you at least put in some large, empty data tables to puff your program up to 300K or so? That way, when the big boys get their teeth knocked in, they won't feel so bad.



my ftp site {remove http:// unless you like error messages}
Dann Corbit
 

Re: YABRL: Thinker 4.5b in the "pack behind Ruffian&quo

Postby Lance Perkins » 27 Feb 2004, 04:33

Geschrieben von: / Posted by: Lance Perkins at 27 February 2004 04:33:24:
Als Antwort auf: / In reply to: Re: YABRL: Thinker 4.5b in the "pack behind Ruffian" geschrieben von: / posted by: Dann Corbit at 26 February 2004 20:44:30:

No need to code anything just to make a bigger binary. Something like this would do:
copy /b Thinker.exe+Ruffian.exe LargeThinker.exe
Thinker of course has no EGTB code. I'm still waiting for someone to write an EGTB DLL and I'll just use it.
Years of developing embedded systems has conditioned me think of small code all the time. Years ago, code that does a lot has to fit in 16K EPROM!!!
As for Thinker, I believe that the small code helps too in that everything fits in the L1 cache or is almost always available in the instuction cache.
The only change made in 4.5c is to fix the bug in early draw claim and a bug in setboard.
The score printed on the PV seems to be lower because the material scores get scaled down to match my position score tables (they don’t seem to agree well in my tests and the only way to stabilize the scores was to scale down the material score; scaling-up the position score tables could probably yield the same result but requires me to fix more code).
I’m hoping that Thinker would also perform decently on longer time controls (like it did in CCT6).
As far as size goes, you will notice that Thinker actually grew from 70K (4.4k) to 77K (4.5a). All that 7K of new eval and new search code I hope is worth it.
[snip]
Pretty ridiculous that a 77K program can knock the stuffings out of programs weighing in at 1 MB.
Sort of like having a flyweight jump into the ring with the heavyweight world champion and knock the stuffings out of him.
Couldn't you at least put in some large, empty data tables to puff your program up to 300K or so? That way, when the big boys get their teeth knocked in, they won't feel so bad.
Lance Perkins
 


Return to Archive (Old Parsimony Forum)

Who is online

Users browsing this forum: No registered users and 27 guests