Fruit 1.5 parameter test

Archive of the old Parsimony forum. Some messages couldn't be restored. Limitations: Search for authors does not work, Parsimony specific formats do not work, threaded view does not work properly. Posting is disabled.

Fruit 1.5 parameter test

Postby Robert Allgeuer » 05 Aug 2004, 19:46

Geschrieben von:/Posted by: Robert Allgeuer at 05 August 2004 20:46:03:

I have run a test with Fruit 1.5 aiming at determining, which of its parameters have a positive, and which a negative impact on Fruit´s playing strength.

Method:
=======
The test consisted of a round robin tournament of several configurations of Fruit 1.5 and a set of reference engines. The reason why this approach was chosen is that I did not want to limit this test to a mere self-play test of the different Fruit configurations, because results of a self-play test may not be representative of the playing strength against other opponents.
The Nunn 1 starting positions were used; for each pairing each engine had to play both sides, resulting in 20 games for each pairing and 3800 games overall.
The tournament results have been analysed with Elostat and a corresponding rating table has been calculated.

Platform, Tools and Settings:
=============================
Athlon XP 2400+
1.1 GB RAM
Windows XP
Elostat 1.1b
Arena 1.08
Time Control: 5min + 2sec
Ponder off
EGTBs enabled when supported
64MB Hash

Participants:
=============
Seven different configurations of Fruit 1.5, including the default settings and six settings with always exactly one UCI-parameter modified:
Fruit v1.5def: Fruit 1.5 with the default parameter setting
Fruit v1.5nmalways: nullmove search is tried always (instead of in the fail-high case only)
Fruit v1.5noetc: ETC disabled
Fruit v1.5ppushext: pawn push extension (7th rank) enabled
Fruit v1.5nosinglerep: single reply extension disabled
Fruit v1.5noqchecks: quiescence search does not include checking moves
Fruit v1.5nmR2: nullmove reduction set to 2 instead of the default 3
plus 13 other engines.

Results:
========


    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
 01 Ruffian v1.01             : 2695   26  42   380    70.7 %   2543   21.3 %
 02 List v5.12                : 2664   28  37   380    66.6 %   2544   23.7 %
 03 El Chinito v3.25          : 2643   29  35   380    63.7 %   2545   23.7 %
 04 Gothmog v0.4.8            : 2604   31  33   380    58.2 %   2547   19.5 %
 05 Fruit v1.5nmalways        : 2596   32  31   380    57.0 %   2548   23.9 %
 06 Fruit v1.5noetc           : 2572   34  30   380    53.3 %   2549   20.8 %
 07 Fruit v1.5ppushext        : 2571   34  29   380    53.2 %   2549   24.2 %
 08 Fruit v1.5def             : 2568   34  29   380    52.8 %   2549   22.9 %
 09 Fruit v1.5nosinglerep     : 2560   35  27   380    51.4 %   2550   28.2 %
 10 Fruit v1.5noqchecks       : 2554   35  30   380    50.5 %   2550   19.5 %
 11 Ktulu v5.0                : 2554   35  27   380    50.5 %   2550   29.5 %
 12 AnMon v5.21               : 2552   36  28   380    50.3 %   2550   24.7 %
 13 SoS4                      : 2547   28  35   380    49.5 %   2550   24.2 %
 14 Amyan v1.592              : 2537   29  35   380    48.0 %   2551   22.4 %
 15 Fruit v1.5nmR2            : 2534   29  34   380    47.5 %   2551   24.5 %
 16 Yace Paderborn            : 2509   33  32   380    43.8 %   2552   18.2 %
 17 Ufim v5.00                : 2460   35  29   380    36.7 %   2555   22.4 %
 18 Frenzee v1.59             : 2439   39  28   380    33.8 %   2556   18.7 %
 19 Patzer v3.61              : 2424   40  27   380    31.7 %   2557   19.7 %
 20 Sjeng v12.13              : 2417   42  27   380    30.9 %   2557   19.2 %



Not surprisingly the differences in playing strength due to the different parameter settings are statistically not significant, even after 3800 games. Nevertheless I would dare following interpretation:
Parameter settings that probably increase Fruit´s playing strength:
- Always trying nullmoves; it seems that the fail-high condition is a bit too aggressive and skips nullmove searches that in fact would have failed high
Parameter settings that probably are performance neutral:
- Disabling ETC (although I reckon that at longer time controls and deeper search depths ETC should give a better return and could yield an increase in playing strength)
- Enabling pawn push extensions
- Disabling single reply extensions
Parameter settings that probably decrease playing strength slightly:
- Disabling checks in quiescence search
Parameter settings that probably decrease playing strength:
- Reducing the nullmove reduction to 2

Conclusion:
===========
Generally the impact of the different parameter settings on Fruit´s playing strength is comparatively small.
I personnally am a bit surprised that enabling/disabling the extensions makes pretty much no difference, and would be interested in views as to why this would be the case.
I also would have expected that not searching checking moves in the quiescence search has a bigger (negative) impact than measured here.
I am currently extending this test by testing two further parameter settings:
- checks in quiescence search only after a nullmove
- the alternative material piece values as proposed by J. Rang
Eventually I plan to also test the combination of the best parameters in order to see whether improvements add up or not.
Robert
Robert Allgeuer
 

Re: Fruit 1.5 parameter test

Postby Joachim Rang » 07 Aug 2004, 11:58

Geschrieben von:/Posted by: Joachim Rang at 07 August 2004 12:58:33:
Als Antwort auf:/In reply to: Fruit 1.5 parameter test geschrieben von:/posted by: Robert Allgeuer at 05 August 2004 20:46:03:
I have run a test with Fruit 1.5 aiming at determining, which of its parameters have a positive, and which a negative impact on Fruit´s playing strength.

Method:
=======
The test consisted of a round robin tournament of several configurations of Fruit 1.5 and a set of reference engines. The reason why this approach was chosen is that I did not want to limit this test to a mere self-play test of the different Fruit configurations, because results of a self-play test may not be representative of the playing strength against other opponents.
The Nunn 1 starting positions were used; for each pairing each engine had to play both sides, resulting in 20 games for each pairing and 3800 games overall.
The tournament results have been analysed with Elostat and a corresponding rating table has been calculated.

Platform, Tools and Settings:
=============================
Athlon XP 2400+
1.1 GB RAM
Windows XP
Elostat 1.1b
Arena 1.08
Time Control: 5min + 2sec
Ponder off
EGTBs enabled when supported
64MB Hash

Participants:
=============
Seven different configurations of Fruit 1.5, including the default settings and six settings with always exactly one UCI-parameter modified:
Fruit v1.5def: Fruit 1.5 with the default parameter setting
Fruit v1.5nmalways: nullmove search is tried always (instead of in the fail-high case only)
Fruit v1.5noetc: ETC disabled
Fruit v1.5ppushext: pawn push extension (7th rank) enabled
Fruit v1.5nosinglerep: single reply extension disabled
Fruit v1.5noqchecks: quiescence search does not include checking moves
Fruit v1.5nmR2: nullmove reduction set to 2 instead of the default 3
plus 13 other engines.

Results:
========




Not surprisingly the differences in playing strength due to the different parameter settings are statistically not significant, even after 3800 games. Nevertheless I would dare following interpretation:
Parameter settings that probably increase Fruit´s playing strength:
- Always trying nullmoves; it seems that the fail-high condition is a bit too aggressive and skips nullmove searches that in fact would have failed high
Parameter settings that probably are performance neutral:
- Disabling ETC (although I reckon that at longer time controls and deeper search depths ETC should give a better return and could yield an increase in playing strength)
- Enabling pawn push extensions
- Disabling single reply extensions
Parameter settings that probably decrease playing strength slightly:
- Disabling checks in quiescence search
Parameter settings that probably decrease playing strength:
- Reducing the nullmove reduction to 2

Conclusion:
===========
Generally the impact of the different parameter settings on Fruit´s playing strength is comparatively small.
I personnally am a bit surprised that enabling/disabling the extensions makes pretty much no difference, and would be interested in views as to why this would be the case.
I also would have expected that not searching checking moves in the quiescence search has a bigger (negative) impact than measured here.
I am currently extending this test by testing two further parameter settings:
- checks in quiescence search only after a nullmove
- the alternative material piece values as proposed by J. Rang
Eventually I plan to also test the combination of the best parameters in order to see whether improvements add up or not.
Robert
>    Program                     Elo    +   -   Games   Score   Av.Op.  Draws
> 01 Ruffian v1.01             : 2695   26  42   380    70.7 %   2543   21.3 %
> 02 List v5.12                : 2664   28  37   380    66.6 %   2544   23.7 %
> 03 El Chinito v3.25          : 2643   29  35   380    63.7 %   2545   23.7 %
> 04 Gothmog v0.4.8            : 2604   31  33   380    58.2 %   2547   19.5 %
> 05 Fruit v1.5nmalways        : 2596   32  31   380    57.0 %   2548   23.9 %
> 06 Fruit v1.5noetc           : 2572   34  30   380    53.3 %   2549   20.8 %
> 07 Fruit v1.5ppushext        : 2571   34  29   380    53.2 %   2549   24.2 %
> 08 Fruit v1.5def             : 2568   34  29   380    52.8 %   2549   22.9 %
> 09 Fruit v1.5nosinglerep     : 2560   35  27   380    51.4 %   2550   28.2 %
> 10 Fruit v1.5noqchecks       : 2554   35  30   380    50.5 %   2550   19.5 %
> 11 Ktulu v5.0                : 2554   35  27   380    50.5 %   2550   29.5 %
> 12 AnMon v5.21               : 2552   36  28   380    50.3 %   2550   24.7 %
> 13 SoS4                      : 2547   28  35   380    49.5 %   2550   24.2 %
> 14 Amyan v1.592              : 2537   29  35   380    48.0 %   2551   22.4 %
> 15 Fruit v1.5nmR2            : 2534   29  34   380    47.5 %   2551   24.5 %
> 16 Yace Paderborn            : 2509   33  32   380    43.8 %   2552   18.2 %
> 17 Ufim v5.00                : 2460   35  29   380    36.7 %   2555   22.4 %
> 18 Frenzee v1.59             : 2439   39  28   380    33.8 %   2556   18.7 %
> 19 Patzer v3.61              : 2424   40  27   380    31.7 %   2557   19.7 %
> 20 Sjeng v12.13              : 2417   42  27   380    30.9 %   2557   19.2 %
>

Hi Robert,
great test and a great testing procedure. I am basically doing the same as a betatester for Fabien and can comment on the following:
First of all the results look pretty good. I'm wondering that Fruit 1.5 scored better than SOS4, Anmon and Amyan. Ktulu has problems with blitz so this result is not surprising.
To the different parameters: Disabling checks in Qscence was also slightly worse in my tests but not as much as I expected. I think the reason is, that by disabling checks in Qscence Fruit reaches significantly deeper depth (almost half a ply) and therfore can compensate for the tactical weakness quite a lot. I tested even with checkextensions disabled and the result was also only slightly worse.
PawnPush and Singlereply: Both tests with these parameters resulted in little changes like in your tests with single_reply slightly better but PawnPush slightly worse. I agree that these parameters are probably not significant to the playing strength of Fruit.
I didn't test ETC since Fabien said it would probably not lead to any difference.
Interesting is that Nullmove=Always socred better even with blitz-time-control. I got similiare results (slightly better) but no significant result. But in addition with your testing it seems Nullmove=Always is a bit better (I think this is even more true for longer time controls).
I really appreciate that test from you and look forward to see your further testing especially the material settings proposed by me. In my test the improvement was significant but later results gave a mixed picture with even slightly worse results of my settings (Heinz Nunn-Rating-List for example). So I am very interested to see another verification/falsification.
If you like you can test experimental version of Fruit too, since Fabien added a lot of endgame stuff and other things and it would be great to get another independent testing result to see the improvement. If you are interested drop me an email (joachim@iwanuschka.de). Unforutnately I am on vacation right now so I can send you an experimental version only August 20th. I would be very happy to see you continuing your Fruit-Testing.
regards Joachim
Joachim Rang
 

Re: Fruit 1.5 parameter test

Postby Günther Simon » 07 Aug 2004, 13:03

Geschrieben von:/Posted by: Günther Simon at 07 August 2004 14:03:19:
Als Antwort auf:/In reply to: Re: Fruit 1.5 parameter test geschrieben von:/posted by: Joachim Rang at 07 August 2004 12:58:33:
Hi Robert,
great test and a great testing procedure. I am basically doing the same as a betatester for Fabien and can comment on the following:
First of all the results look pretty good. I'm wondering that Fruit 1.5 scored better than SOS4, Anmon and Amyan. Ktulu has problems with blitz so this result is not surprising.
I have completely different experience with Ktulu 5.1 and own book
under WB of course. Could have been just luck too, as I had just a
little fast test tourney so far with it. (->RWBC -> Test Tourneys ->Test7)
(No idea with what books Robert tests and if Ktulu played as WB or UCI
under Arena though...)
At least from what I have read so far I think that Ktulu 5.1 is quite
better than first release some months ago.
Best regards,
Günther
Günther Simon
 

Re: Fruit 1.5 parameter test

Postby Robert Allgeuer » 07 Aug 2004, 17:33

Geschrieben von:/Posted by: Robert Allgeuer at 07 August 2004 18:33:35:
Als Antwort auf:/In reply to: Re: Fruit 1.5 parameter test geschrieben von:/posted by: Günther Simon at 07 August 2004 14:03:19:
Hi Robert,
great test and a great testing procedure. I am basically doing the same as a betatester for Fabien and can comment on the following:
First of all the results look pretty good. I'm wondering that Fruit 1.5 scored better than SOS4, Anmon and Amyan. Ktulu has problems with blitz so this result is not surprising.
I have completely different experience with Ktulu 5.1 and own book
under WB of course. Could have been just luck too, as I had just a
little fast test tourney so far with it. (->RWBC -> Test Tourneys ->Test7)
(No idea with what books Robert tests and if Ktulu played as WB or UCI
under Arena though...)
At least from what I have read so far I think that Ktulu 5.1 is quite
better than first release some months ago.
Best regards,
Günther
It was actually Ktulu 5.0 and not 5.1 that I have included and 5.0 appears to be no improvement over the free 4.2. Btw, also in my YABRL rating list where I have tested Ktulu totally independently on another computer, a different testing method and under Winboard it scored lower than 4.2.
Ktulu was playing as UCI-engine like all other engines in the test with the exception of El Chinito (Winboard). It was the standard book for Ktulu which I did not disable; however this was a test starting from the Nunn positions, so opening books do not play an important role, though it may have happened that the starting Nunn position was in the book of the one or the other engine (not for Fruit of course), but this does not matter in this specific test, because the focus was NOT to test the strength of the other engines, also not how Fruit compares to the other engines, but exclusively to compare the strengths of the different Fruit settings. The other engines served as "sparring partners" only.
I generally think that the overall results for the other engines are biased, because the ones for whom Fruit is a tough opponent will have scored comparatively low here, others who do well against Fruit will have scored very well.
Robert
Robert Allgeuer
 

Re: Fruit 1.5 parameter test

Postby Günther Simon » 07 Aug 2004, 17:40

Geschrieben von:/Posted by: Günther Simon at 07 August 2004 18:40:41:
Als Antwort auf:/In reply to: Re: Fruit 1.5 parameter test geschrieben von:/posted by: Robert Allgeuer at 07 August 2004 18:33:35:
Hi Robert,
great test and a great testing procedure. I am basically doing the same as a betatester for Fabien and can comment on the following:
First of all the results look pretty good. I'm wondering that Fruit 1.5 scored better than SOS4, Anmon and Amyan. Ktulu has problems with blitz so this result is not surprising.
I have completely different experience with Ktulu 5.1 and own book
under WB of course. Could have been just luck too, as I had just a
little fast test tourney so far with it. (->RWBC -> Test Tourneys ->Test7)
(No idea with what books Robert tests and if Ktulu played as WB or UCI
under Arena though...)
At least from what I have read so far I think that Ktulu 5.1 is quite
better than first release some months ago.
Best regards,
Günther
It was actually Ktulu 5.0 and not 5.1 that I have included
I generally think that the overall results for the other engines are biased, because the ones for whom Fruit is a tough opponent will have scored comparatively low here, others who do well against Fruit will have scored very well.
Robert
I know, I had read the table ;) therefore I wrote 'I have .......'
and 'I think ....'. As I had mentioned, 5.0 was several months
before and I believe 5.1 is much stronger.
I mentioned this all because Joachim generalized Ktulu by naming
it without version number.
Yes agreed.
Regards,
Günther
Günther Simon
 

Re: Fruit 1.5 parameter test

Postby Robert Allgeuer » 07 Aug 2004, 21:02

Geschrieben von:/Posted by: Robert Allgeuer at 07 August 2004 22:02:46:
Als Antwort auf:/In reply to: Re: Fruit 1.5 parameter test geschrieben von:/posted by: Joachim Rang at 07 August 2004 12:58:33:

Hi Robert,
great test and a great testing procedure. I am basically doing the same as a betatester for Fabien and can comment on the following:
First of all the results look pretty good. I'm wondering that Fruit 1.5 scored better than SOS4, Anmon and Amyan. Ktulu has problems with blitz so this result is not surprising.
To the different parameters: Disabling checks in Qscence was also slightly worse in my tests but not as much as I expected. I think the reason is, that by disabling checks in Qscence Fruit reaches significantly deeper depth (almost half a ply) and therfore can compensate for the tactical weakness quite a lot. I tested even with checkextensions disabled and the result was also only slightly worse.
PawnPush and Singlereply: Both tests with these parameters resulted in little changes like in your tests with single_reply slightly better but PawnPush slightly worse. I agree that these parameters are probably not significant to the playing strength of Fruit.
I didn't test ETC since Fabien said it would probably not lead to any difference.
Interesting is that Nullmove=Always scored better even with blitz-time-control. I got similiare results (slightly better) but no significant result. But in addition with your testing it seems Nullmove=Always is a bit better (I think this is even more true for longer time controls).
I really appreciate that test from you and look forward to see your further testing especially the material settings proposed by me. In my test the improvement was significant but later results gave a mixed picture with even slightly worse results of my settings (Heinz Nunn-Rating-List for example). So I am very interested to see another verification/falsification.
If you like you can test experimental version of Fruit too, since Fabien added a lot of endgame stuff and other things and it would be great to get another independent testing result to see the improvement. If you are interested drop me an email (joachim@iwanuschka.de). Unforutnately I am on vacation right now so I can send you an experimental version only August 20th. I would be very happy to see you continuing your Fruit-Testing.
regards Joachim
The Anmon version used here was the old 5.21, which I deliberately chose in order to have a good distribution of opponents expected to be stronger, around equal and weaker than Fruit. Both Amyan and Anmon (all versions) btw are in my rating list behind Fruit. I expected Sos4 in Blitz (as it is also in my rating list) at pretty much an identical level as Fruit. I was also a bit suprised that it has done relatively bad here, but I have checked memory consumption, UCI settings etc., I am quite sure everything was ok.

Is there a way to disable check extensions in Fruit?
I guess ETC testing would be needed at different time controls and to see whether there is a trend (e.g. more return from ETC at deeper search depths), but unfortunately this would be very time consuming tests.
Currently I have the checks after nullmove version running (apparently not too hot this one), then I will run your new material values.
Robert
Robert Allgeuer
 

Re: Fruit 1.5 parameter test

Postby Joachim Rang » 09 Aug 2004, 22:33

Geschrieben von:/Posted by: Joachim Rang at 09 August 2004 23:33:30:
Als Antwort auf:/In reply to: Re: Fruit 1.5 parameter test geschrieben von:/posted by: Robert Allgeuer at 07 August 2004 22:02:46:
Hi Robert,
great test and a great testing procedure. I am basically doing the same as a betatester for Fabien and can comment on the following:
First of all the results look pretty good. I'm wondering that Fruit 1.5 scored better than SOS4, Anmon and Amyan. Ktulu has problems with blitz so this result is not surprising.
To the different parameters: Disabling checks in Qscence was also slightly worse in my tests but not as much as I expected. I think the reason is, that by disabling checks in Qscence Fruit reaches significantly deeper depth (almost half a ply) and therfore can compensate for the tactical weakness quite a lot. I tested even with checkextensions disabled and the result was also only slightly worse.
PawnPush and Singlereply: Both tests with these parameters resulted in little changes like in your tests with single_reply slightly better but PawnPush slightly worse. I agree that these parameters are probably not significant to the playing strength of Fruit.
I didn't test ETC since Fabien said it would probably not lead to any difference.
Interesting is that Nullmove=Always scored better even with blitz-time-control. I got similiare results (slightly better) but no significant result. But in addition with your testing it seems Nullmove=Always is a bit better (I think this is even more true for longer time controls).
I really appreciate that test from you and look forward to see your further testing especially the material settings proposed by me. In my test the improvement was significant but later results gave a mixed picture with even slightly worse results of my settings (Heinz Nunn-Rating-List for example). So I am very interested to see another verification/falsification.
If you like you can test experimental version of Fruit too, since Fabien added a lot of endgame stuff and other things and it would be great to get another independent testing result to see the improvement. If you are interested drop me an email (joachim@iwanuschka.de). Unforutnately I am on vacation right now so I can send you an experimental version only August 20th. I would be very happy to see you continuing your Fruit-Testing.
regards Joachim
The Anmon version used here was the old 5.21, which I deliberately chose in order to have a good distribution of opponents expected to be stronger, around equal and weaker than Fruit. Both Amyan and Anmon (all versions) btw are in my rating list behind Fruit. I expected Sos4 in Blitz (as it is also in my rating list) at pretty much an identical level as Fruit. I was also a bit suprised that it has done relatively bad here, but I have checked memory consumption, UCI settings etc., I am quite sure everything was ok.

Is there a way to disable check extensions in Fruit?
I guess ETC testing would be needed at different time controls and to see whether there is a trend (e.g. more return from ETC at deeper search depths), but unfortunately this would be very time consuming tests.
Currently I have the checks after nullmove version running (apparently not too hot this one), then I will run your new material values.
Robert
Maybe Fabien didn't put it as an option in the public release but there are several experimental version with a lot of more options.
Fabien wrote that ETC is probably better for longer time controls but I think it is anyway one option of minor importence so I don't think it is worth testing.
regards Joachim
Joachim Rang
 


Return to Archive (Old Parsimony Forum)

Who is online

Users browsing this forum: No registered users and 28 guests