Averno 0.70 successful in tournament test

Archive of the old Parsimony forum. Some messages couldn't be restored. Limitations: Search for authors does not work, Parsimony specific formats do not work, threaded view does not work properly. Posting is disabled.

Averno 0.70 successful in tournament test

Postby Heinz van Kempen » 02 Feb 2004, 20:31

Geschrieben von: / Posted by: Heinz van Kempen at 02 February 2004 20:31:29:

Hi all :-),
Averno 0.70 was tested previously before running Nunn2 C, what I will also do for all untested versions that come in time.
No bugs found in Averno. No time exceeds seen.
Conditions:
Athlon 2600+
64MB Hash
4 minutes + 2 seconds (Fischer)
Nunn 2 positions 1-20 (two games each with white and black pieces) = 80 games
5 men EGTB
Results:
Averno 0.70 - Movei 00_8_158 18.5 - 21.5
Averno 0.70 - Movei 00_8_163 14.0 - 26.0

Up to now there seems to be a big improvement over Averno 0.52, if we keep in mind that the old version is rated about 200 points less than best Movei versions. I do not want to be too enthusic after so few games, but anyway look at the score against Movei 00.8.158, that will presumably win Nunn2 A.
For Movei here no conclusions are drawn for comparing both versions due to statistically insignificant number of games, anyway results are astonishing.
The new Averno seems to be very much improved concerning certain endgames and tactics. Two crushing short wins in Nunn pos. 1, where both Movei versions played identically (what usually not happened) and a fantastic move 21.Nb5 in Nunn pos.7. Time management is a bit daring, using a lot of time for first 30-40 moves and afterwards forced to play rapidly, as mainly only the increment is used from move 40 onwards. This might be due to the Fischer clock. But it works quite well in most games. In some games however the advantage gained in early middle games got spoiled that way later on.
If interested in the games Jose and Uri might send a short message to me.
Next test match already in progress is Terra 3.3b5 vs. The Baron 1.2.0b8a and The Baron CCT6 (single), as Terra should have a new chance without time management bug. Peter is on holidays, but might read this. New version so far with good time management and no other bugs detected and it scores very well so far. New Frenzee 151 will also be tested before tournaments start.
Best Regards
Heinz
Heinz van Kempen
 

Re: Averno 0.70 successful in tournament test

Postby Uri Blass » 02 Feb 2004, 21:36

Geschrieben von: / Posted by: Uri Blass at 02 February 2004 21:36:02:
Als Antwort auf: / In reply to: Averno 0.70 successful in tournament test geschrieben von: / posted by: Heinz van Kempen at 02 February 2004 20:31:29:
Hi all :-),
Averno 0.70 was tested previously before running Nunn2 C, what I will also do for all untested versions that come in time.
No bugs found in Averno. No time exceeds seen.
Conditions:
Athlon 2600+
64MB Hash
4 minutes + 2 seconds (Fischer)
Nunn 2 positions 1-20 (two games each with white and black pieces) = 80 games
5 men EGTB
Results:
Averno 0.70 - Movei 00_8_158 18.5 - 21.5
Averno 0.70 - Movei 00_8_163 14.0 - 26.0
Up to now there seems to be a big improvement over Averno 0.52, if we keep in mind that the old version is rated about 200 points less than best Movei versions. I do not want to be too enthusic after so few games, but anyway look at the score against Movei 00.8.158, that will presumably win Nunn2 A.
For Movei here no conclusions are drawn for comparing both versions due to statistically insignificant number of games, anyway results are astonishing.
The new Averno seems to be very much improved concerning certain endgames and tactics. Two crushing short wins in Nunn pos. 1, where both Movei versions played identically (what usually not happened)
Note also that 00_8_163 is the cct version of movei.
I am not surprised that they played identically because after all the changes that I did are only small changes(I do not know how much they are productive against other programs but the result is encouraging).
Uri
Uri Blass
 

Thanks and a beta testing note

Postby Jose Carlos » 02 Feb 2004, 22:09

Geschrieben von: / Posted by: Jose Carlos at 02 February 2004 22:09:41:
Als Antwort auf: / In reply to: Averno 0.70 successful in tournament test geschrieben von: / posted by: Heinz van Kempen at 02 February 2004 20:31:29:
Hi all :-),
Averno 0.70 was tested previously before running Nunn2 C, what I will also do for all untested versions that come in time.
No bugs found in Averno. No time exceeds seen.
Conditions:
Athlon 2600+
64MB Hash
4 minutes + 2 seconds (Fischer)
Nunn 2 positions 1-20 (two games each with white and black pieces) = 80 games
5 men EGTB
Results:
Averno 0.70 - Movei 00_8_158 18.5 - 21.5
Averno 0.70 - Movei 00_8_163 14.0 - 26.0
Up to now there seems to be a big improvement over Averno 0.52, if we keep in mind that the old version is rated about 200 points less than best Movei versions. I do not want to be too enthusic after so few games, but anyway look at the score against Movei 00.8.158, that will presumably win Nunn2 A.
For Movei here no conclusions are drawn for comparing both versions due to statistically insignificant number of games, anyway results are astonishing.
The new Averno seems to be very much improved concerning certain endgames and tactics. Two crushing short wins in Nunn pos. 1, where both Movei versions played identically (what usually not happened) and a fantastic move 21.Nb5 in Nunn pos.7. Time management is a bit daring, using a lot of time for first 30-40 moves and afterwards forced to play rapidly, as mainly only the increment is used from move 40 onwards. This might be due to the Fischer clock. But it works quite well in most games. In some games however the advantage gained in early middle games got spoiled that way later on.
If interested in the games Jose and Uri might send a short message to me.
Next test match already in progress is Terra 3.3b5 vs. The Baron 1.2.0b8a and The Baron CCT6 (single), as Terra should have a new chance without time management bug. Peter is on holidays, but might read this. New version so far with good time management and no other bugs detected and it scores very well so far. New Frenzee 151 will also be tested before tournaments start.
Best Regards
Heinz
Thanks Heinz. Those results are in the same line as my tests. But for some reason I failed badly in CCT6.
I'm gonna take a rest in the next days and later I'll release a new version, most probably 0.70 that played in CCT6.
Later I'll ask for some beta tester as I have many experimental versions I need to test and only one computer. If you or someone else reading this post is interested in beta testing just drop me an email.
But remember that beta testing is a hard work: comparing very similar versions on many games and test suites, looking for bugs in engine and book (get ready for unexpected crashes!), looking for bad moves in games, keeping all log files in case they're needed, trying to guess what kind of positions the program can't play well...
On the other hand I don't require a specific testing methodology to testers, because I want diversity, many different ways of looking at the program.
José C.
Jose Carlos
 

Re: Averno 0.70 successful in tournament test

Postby Uri Blass » 02 Feb 2004, 22:17

Geschrieben von: / Posted by: Uri Blass at 02 February 2004 22:17:08:
Als Antwort auf: / In reply to: Re: Averno 0.70 successful in tournament test geschrieben von: / posted by: Tom Likens at 02 February 2004 22:01:36:
I am not surprised that they played identically because after all the changes that I did
are only small changes(I do not know how much they are productive against other
programs but the result is encouraging).
Uri
I'm curious Uri, have you held off adding EGTB support because you eventually want to go
commercial (and not use Nalimov's and Andrew's code) or is it on your TODO list and you
just haven't gotten around to it yet?
BTW, I thought Movei did rather well in this tournament.
if you could have gotten the 1st round win against Junior, since I think you deserved it.
regards,
--tom
P.S. So *who* is the Israeli champion??

I think that it is better to add knowledge about endgames first and only after the programs knows to evaluate endgames better to add tablebases.
tablebases is not replacement for knowledge because it is impossible to look at them at every node without significant reduction in speed and having tablebases too early may hide weaknesses.


I agree
It would have been nice
Thanks but I think I did not deserve it because Movei did not have the knowledge to win the game.
What champion?
I guess that Junior is the best israeli program.
Uri
Uri Blass
 

Re: Thanks and a beta testing note

Postby Heinz van Kempen » 02 Feb 2004, 22:30

Geschrieben von: / Posted by: Heinz van Kempen at 02 February 2004 22:30:54:
Als Antwort auf: / In reply to: Thanks and a beta testing note geschrieben von: / posted by: Jose Carlos at 02 February 2004 22:09:41:
Thanks Heinz. Those results are in the same line as my tests. But for some reason I failed badly in CCT6.
I'm gonna take a rest in the next days and later I'll release a new version, most probably 0.70 that played in CCT6.
Later I'll ask for some beta tester as I have many experimental versions I need to test and only one computer. If you or someone else reading this post is interested in beta testing just drop me an email.
But remember that beta testing is a hard work: comparing very similar versions on many games and test suites, looking for bugs in engine and book (get ready for unexpected crashes!), looking for bad moves in games, keeping all log files in case they're needed, trying to guess what kind of positions the program can't play well...
On the other hand I don't require a specific testing methodology to testers, because I want diversity, many different ways of looking at the program.
José C.
Hello Jose,
thanks :-), I had a lot of fun watching those games, there were really some nice ones.
As I want to push this Nunn tournaments with first having all in, for the moment I am still unable because of lack of CPU time to do betatesting exhaustively for anyone. But I am sure you will find good beta testers. I can only underline that your new version seems to be improved by at least 100 points, we will need more games in the tournament to confirm this, but I am very optimistic from what I saw so far.
What I can offer to you and offer to all, is that from time to time a new promising version will be tested under the same conditions, just to have something comparable based on a lot of games, although they are only on Blitz level.
So in a few months this will all be more useful as it appears to be by now I hope.
Best Regards
Heinz
Heinz van Kempen
 

Re: Averno 0.70 successful in tournament test

Postby Tord Romstad » 03 Feb 2004, 08:04

Geschrieben von: / Posted by: Tord Romstad at 03 February 2004 08:04:39:
Als Antwort auf: / In reply to: Re: Averno 0.70 successful in tournament test geschrieben von: / posted by: Uri Blass at 02 February 2004 22:17:08:
I think that it is better to add knowledge about endgames first and only after
the programs knows to evaluate endgames better to add tablebases.
tablebases is not replacement for knowledge because it is impossible to look at
them at every node without significant reduction in speed and having tablebases
too early may hide weaknesses.
I agree entirely. You already know my opinions about this, of course. After
all, we have discussed this in private e-mail as well as on CCC during the
last week. But for the benefit of other iterested readers, this is what I
think:
Probing tablebases is really slow. As far as I know, nobody probes them in
the qsearch. I have even found that it is too slow to probe them in the
last few plies of the main search. In Gothmog, I don't probe at all if the
remaining depth is less than 4 plies. This means that by correctly
evaluating some of the non-trivial tablebase positions, I can sometimes
find wins or draws 4 plies (or more, if the final exchanges take place
in the qsearch) earlier.
For this reason, writing code to handle basic endgames is well worth
the effort. The danger of implementing tablebase support too early
is that it can easily cause laziness. Your engine will play KRPKR
endgames perfectly when they appear on the board, and you might never
notice that it horribly misevaluates KRPKR positions when they appear
near the leaves. You are much more likely to fix your KRPKR eval if
you repeatedly have painful experiences of watching your engine get
outplayed in this endgame.
Recent versions of Gothmog do have tablebase support, but I usually
leave it disabled when testing the engine.
Tord
Tord Romstad
 

Re: Averno 0.70 successful in tournament test

Postby Jose Carlos » 03 Feb 2004, 10:31

Geschrieben von: / Posted by: Jose Carlos at 03 February 2004 10:31:00:
Als Antwort auf: / In reply to: Re: Averno 0.70 successful in tournament test geschrieben von: / posted by: Tord Romstad at 03 February 2004 08:04:39:
I think that it is better to add knowledge about endgames first and only after
the programs knows to evaluate endgames better to add tablebases.
tablebases is not replacement for knowledge because it is impossible to look at
them at every node without significant reduction in speed and having tablebases
too early may hide weaknesses.
I agree entirely. You already know my opinions about this, of course. After
all, we have discussed this in private e-mail as well as on CCC during the
last week. But for the benefit of other iterested readers, this is what I
think:
Probing tablebases is really slow. As far as I know, nobody probes them in
the qsearch. I have even found that it is too slow to probe them in the
last few plies of the main search. In Gothmog, I don't probe at all if the
remaining depth is less than 4 plies. This means that by correctly
evaluating some of the non-trivial tablebase positions, I can sometimes
find wins or draws 4 plies (or more, if the final exchanges take place
in the qsearch) earlier.
For this reason, writing code to handle basic endgames is well worth
the effort. The danger of implementing tablebase support too early
is that it can easily cause laziness. Your engine will play KRPKR
endgames perfectly when they appear on the board, and you might never
notice that it horribly misevaluates KRPKR positions when they appear
near the leaves. You are much more likely to fix your KRPKR eval if
you repeatedly have painful experiences of watching your engine get
outplayed in this endgame.
Recent versions of Gothmog do have tablebase support, but I usually
leave it disabled when testing the engine.
Tord
While I agree that more knowledge is generally good, it's not necessarily true that probing EGTB's is too slow to be worth it. It needs a lot of tuning. For example you might want to save the result of an EGTB probe to your hash table with a big draft. Then, probing it in next iterations won't slow you down at all (assuming you probe the hash table before the EGTB's). Aging criteria is important here to avoid a hash table full of useless tablebase hits.
You might also want to decide when to probe EGTB's based on other considerations. You do internal node evaluation so you might want to probe EGTB's only in case static eval tells you something like "I'm not 100% sure" or avoid probing when static eval is way above beta and the opponent has no threats, etc.
José C.
Jose Carlos
 

Re: Averno 0.70 successful in tournament test

Postby Sune Fischer » 03 Feb 2004, 11:59

Geschrieben von: / Posted by: Sune Fischer at 03 February 2004 11:59:46:
Als Antwort auf: / In reply to: Re: Averno 0.70 successful in tournament test geschrieben von: / posted by: Tord Romstad at 03 February 2004 08:04:39:
I think that it is better to add knowledge about endgames first and only after
the programs knows to evaluate endgames better to add tablebases.
tablebases is not replacement for knowledge because it is impossible to look at
them at every node without significant reduction in speed and having tablebases
too early may hide weaknesses.
I agree entirely. You already know my opinions about this, of course. After
all, we have discussed this in private e-mail as well as on CCC during the
last week. But for the benefit of other iterested readers, this is what I
think:
Probing tablebases is really slow. As far as I know, nobody probes them in
the qsearch. I have even found that it is too slow to probe them in the
last few plies of the main search. In Gothmog, I don't probe at all if the
remaining depth is less than 4 plies. This means that by correctly
evaluating some of the non-trivial tablebase positions, I can sometimes
find wins or draws 4 plies (or more, if the final exchanges take place
in the qsearch) earlier.
For this reason, writing code to handle basic endgames is well worth
the effort. The danger of implementing tablebase support too early
is that it can easily cause laziness. Your engine will play KRPKR
endgames perfectly when they appear on the board, and you might never
notice that it horribly misevaluates KRPKR positions when they appear
near the leaves. You are much more likely to fix your KRPKR eval if
you repeatedly have painful experiences of watching your engine get
outplayed in this endgame.
Recent versions of Gothmog do have tablebase support, but I usually
leave it disabled when testing the engine.
Tord
I definitely agree with that, TBs won't replace good knowledge of the endgame
at the leaves.
The debate is a little complex than that though.
IIRC one of Uri's statements was that TBs can even be bad to a engine that is not knowledgable.
That is the point I disagree with, I think TBs are generally good if you don't
probe them too aggressively and thereby slow down too much.
It is true there has been posted a few examples where the position got
misevaluated as e.g. better, then the rook refused to eat a pawn because it
saw that would be a draw.
However not taking the pawn would actually lose.
The problem here is the abrupt change in scores, like you've also talked about
before there is a problem with having multiple evaluation functions because
discontinuities will produce odd results.
If this is still Uri's POV, then I of course agree there is point to that,
I just think this problem is completely dwarfed by the good TBs can do for your engine.
There is also another "problem", namely that of swindle mode.
The stronger side of a drawn endgame will stop playing for a win, ie. stop
pushing the opponent to the corner or start giving away excess material.
I think the problem here is in the core of alpha-beta, we assume he will play
as perfect as we do and in some cases that is just a bad assumption.
This is obvious particularly when playing humans.
E.g. choosing between paths of nearly equal scores you should choose the
one where there is the highest probability your opponent will blunder.
We can see how e.g. Crafty tries to solve it by using antisymmetric
evaluation, to make Crafty play open positions against humans.
This is in the attempt to reach a position that is tacticly complicated so
humans are more likely to blunder.
I do the same.
Testing the EGTB implementation is a seperate job, best done by selfplay
matches against a version without TBs, IMO.
-S.
Sune Fischer
 


Return to Archive (Old Parsimony Forum)

Who is online

Users browsing this forum: No registered users and 34 guests