Fast(er) bitboard move generator

Programming Topics (Computer Chess) and technical aspects as test techniques, book building, program tuning etc

Moderator: Andres Valverde

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 20 Jun 2006, 23:28

Gerd Isenberg wrote:Congrats! What squares were the most difficult one to find? Edges or border - near 0,7,56 or 63?

The worst squares seems to be the edges.

About hash collisions, I already accept those if they give the same pattern.

About the addressing; on square A1 I have to 'see' A2-A7 and B1-G1 = 12 bits, giving 4096 different addresses. But, on A1 there is atmost 7*7=49 different patterns to lookup, and, if I have enough 'good' collisions (around 2000 or more), it would be teoretically possible to find a key that map those 49 patterns to 2048 elements. Time will tell. It's easier for squares inside B2-B7-G7-G2, because one have only 2*5=10 bits giving 1024 different addresses (8 minus edges minus piece). For D4,E4,D5,E5 one have 12*12=144 different patterns to lookup. It seems like the number of addresses is the heaviest part to deal with.

Pradu wrote:Mabe it isn't possible to find the keys needed in RmovesIncrement in reasonable time using a brute force method considering the trouble you had generating keys for a 1 to 1 minimal perfect hash (the 4906 one)...does anyone know of more efficient ways to find these keys or to find out if such keys even exist?

I do not have a minimal perfect hash, and I dont think I can make it either. When I found keys to map 4096 addresses to max 144 patterns many of them are probably 'good' collisions.

Lasse
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Pradu » 20 Jun 2006, 23:34

Gerd Isenberg wrote:That hashing is possible with an additional indirection. Thus reducing the [64][4096] tables from bitboard to 16-bit short. Thus from 2MB each to 512KB. 16-bit index for all enumerated attack sets. But getting 64- or 128-byte cachelines from 1MB or 4MB to get two or eight byte doesn't matter much as long you rarely profit from sharing cachelines for different attack pattern.

Sorry. What I wanted to say was a hashing function that uses a multiplication with a key and a shift so that it might look like the following without using any additional indirection databases.

Code: Select all
#define Rmoves(pos,square) *(RmovesPointer[square]+RmovesIncrement(pos,square))
#define Bmoves(pos,square) *(BmovesPointer[square]+BmovesIncrement(pos,square))


RmovesIncrement might look something like:

Code: Select all
#define RmovesIncrement(pos, square) ((((pos).AllPieces & rookMask[square])*magickey[square])>>shift[square])


And it will return a number from 0 to n to increment to RmovesPointer. So you need to generate a magickey that will cause all the relavent keys to hash to a single index.
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: Fast(er) bitboard move generator

Postby Pradu » 20 Jun 2006, 23:39

Lasse Hansen wrote:I do not have a minimal perfect hash, and I dont think I can make it either. When I found keys to map 4096 addresses to max 144 patterns many of them are probably 'good' collisions.
Oh..
User avatar
Pradu
 
Posts: 343
Joined: 12 Jan 2005, 19:17
Location: Chandler, Arizona, USA

Re: Fast(er) bitboard move generator

Postby Gerd Isenberg » 21 Jun 2006, 00:05

Lasse Hansen wrote:About the addressing; on square A1 I have to 'see' A2-A7 and B1-G1 = 12 bits, giving 4096 different addresses. But, on A1 there is atmost 7*7=49 different patterns to lookup, and, if I have enough 'good' collisions (around 2000 or more), it would be teoretically possible to find a key that map those 49 patterns to 2048 elements. Time will tell. It's easier for squares inside B2-B7-G7-G2, because one have only 2*5=10 bits giving 1024 different addresses (8 minus edges minus piece). For D4,E4,D5,E5 one have 12*12=144 different patterns to lookup. It seems like the number of addresses is the heaviest part to deal with.
Lasse

Yes, the problem might be, if you have six occupied bits of one edge-ray with one bit of the second edge-ray already in the upper bits. Thus 7 of 11 upper bits are already used, decreasing "degree of freedom" to shift further left to get a lot of 'good' collisions.
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Fast(er) bitboard move generator

Postby Gerd Isenberg » 21 Jun 2006, 08:00

Pradu wrote:
Gerd Isenberg wrote:That hashing is possible with an additional indirection. Thus reducing the [64][4096] tables from bitboard to 16-bit short. Thus from 2MB each to 512KB. 16-bit index for all enumerated attack sets. But getting 64- or 128-byte cachelines from 1MB or 4MB to get two or eight byte doesn't matter much as long you rarely profit from sharing cachelines for different attack pattern.

Sorry. What I wanted to say was a hashing function that uses a multiplication with a key and a shift so that it might look like the following without using any additional indirection databases.

Code: Select all
#define Rmoves(pos,square) *(RmovesPointer[square]+RmovesIncrement(pos,square))
#define Bmoves(pos,square) *(BmovesPointer[square]+BmovesIncrement(pos,square))


RmovesIncrement might look something like:

Code: Select all
#define RmovesIncrement(pos, square) ((((pos).AllPieces & rookMask[square])*magickey[square])>>shift[square])


And it will return a number from 0 to n to increment to RmovesPointer. So you need to generate a magickey that will cause all the relavent keys to hash to a single index.


Problem is to somehow map up to 12 occupied bits. The biggest bit-distance in the mask between ls1b and ms1b is usually 40 but up to 55 for the edges. Thus even a u64->double conversion with 52 significant mantissa bits may lose some bits - so that even some fourier-series with some sums of sinus does not work - despite such double tricks would be much too expensive. The idea is to do only and, mul, shift, lookup. I fear to hash directly to that low range is not possible with a few cheap instructions and without a second cascaded lookup.
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 21 Jun 2006, 13:11

Hi again!

Now I have found new keys, so that the sizes of the attack tables are:
Code: Select all
TBitBoard RookAttacks[64][4096];
TBitBoard BishopAttacks[64][1024];


I have alse made mobility tables that compares to those crafty uses.
(Its not a PopCount substitute)
I also made mobility tables that only counts non-captures,
but these tables got pretty much larger. Sizes so far:
Code: Select all
char BishopMobilityExt[64][1024];   // captures+non-captures
char RookMobilityExt[64][4096];
char BishopMobility[64][32768];    // non-captures
char RookMobility[64][32768];


The use is like:
Code: Select all
   Pcs=WRooks | BRooks;
   while (Pcs) {
      From=FirstOne(Pcs);
      Pattern = (AllWPieces | AllBPieces) & MaskRook[From];
      Address = (Pattern*HashKeyRookMobilityExt[From])>>(64-RMTE_BITS);
      n+=RookMobilityExt[From][Address];
      Pcs &= ClrBit(From);
   }

It seems to me that crafty is even more efficient than this approach.

So, to test the speed I modified Perft to count and accumulate the mobility for all sliding pieces
in all end-leaf positions. Using:
"r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w QKqk -\n"
5 plies deep I got
Code: Select all
N=197876242 time=46.27 Mobility=0x3167821d3 4.277MNps   // capt+non-capts
N=197876242 time=47.56 Mobility=0x2160bd8cc 4.161MNps   // non-capts only

where I had 17.5-18MNps without mobility. One interesting factor is the slowdown when
using non-capturing tables. Since the code is exactly the same, only difference is
the size of the lookup-tables, this might be due to cache pollution.

Edit: Instead of calculating mobility in Eval, I did it in MovgenCapts instead. Of course one looses some information then, but, the speed drop when finding quasi-legal moves in perft was only 4.2%, and 0.x% when finding legal moves.
Code: Select all
N=200180422 time=3.55 Mobility=0x0861c6f 56.389MNps
N=200180422 time=3.40 Mobility=0x00 58.877MNps


Lasse
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Gerd Isenberg » 21 Jun 2006, 20:21

Code: Select all
TBitBoard RookAttacks[64][4096];
TBitBoard BishopAttacks[64][1024];


Hi Lasse,

wow, bishops only 1/4. That is what i almost get while hashing enumerated moves (1/2) or movelist of either from/to or vectored attack sets in one of 8 sliding directions.

With such huge but rare populated tables, you seldom share cachelines by different attack or even occupied pattern anyway. Thus most often you fetch a 64-byte cacheline from your tables to L1, you need only eight bytes for the attacks or even one byte for the popcounts from one cacheline. To get attacks and popcounts you'll also need two different cachelines which implies bigger cache pollution - even more if you start to go with the real search with transposition table and a lot of other eval related data and much greater stack usage due to more local variables and probably explicite stacks and much deeper search with greater stack usage and traffic.

I suggest to combine tables and to make arrays of structs, instead of seperate attack-bitboards and popcount bytes. Of course we don't take 9-byte entries - i suggest 32-byte or even 64-byte entries, properly aligned, with some additional usefull information like several popcounts with and without considering center weights and probably distance (0..6 as index) to opposite king, keeping disjoint popcounts for the four "conditional" squares etc. Even keeping a complete piece-square-scored movelist of that piece including perhaps a separate array for up to four "conditional" moves might be a funny option to avoid bitscanning and to go with an almost branchless movegen by some memcopy.

This 40++MByte approach might be more efficient than your current one, because the ratio of usefull information per cacheline becomes almost 100% instead of 12.5% or less. Of course there are huge gaps in memory and page- and tlb-issues. Just brainstorming...

Gerd
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 22 Jun 2006, 02:18

Gerd Isenberg wrote:wow, bishops only 1/4. That is what i almost get while hashing enumerated moves (1/2) or movelist of either from/to or vectored attack sets in one of 8 sliding directions.

For those interested, I've found keys for a 512 elements table for BishopAttacks by now. When I look at both BishopAttacks and RookAttacks, I see that the worst case for rooks is A8,H8,H1,A1, because those squares need 6+6=12 bits of address. And, for bishops, the worst squares are D4,D5,E4,E5, and the addresses needed there are 4+5=9 bits. And this is what I have keys for so far. But I still hope there is a chance to get keys for a 2048 RookAttacks table :)

Lasse
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Grant Osborne » 22 Jun 2006, 10:37

Lasse

Am I thinking along the right lines here?
If you hoping to generate 2048 keys for the rook, then for squares H8, H1, A8 and A1 you will need to check the collisions.
If RookAttacks[sqr][Address] != -1, check that the first set bit in both directions from sqr of AttacksFile[sqr][i&63] and AttacksRank[sqr][i>>6] are the same as the contents of RookAttacks[sqr][Address] with the correct rank and file mask applied.

Regards
Grant
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 22 Jun 2006, 14:09

Grant Osborne wrote:If you hoping to generate 2048 keys for the rook, then for squares H8, H1, A8 and A1 you will need to check the collisions.
If RookAttacks[sqr][Address] != -1, check that the first set bit in both directions from sqr of AttacksFile[sqr][i&63] and AttacksRank[sqr][i>>6] are the same as the contents of RookAttacks[sqr][Address] with the correct rank and file mask applied.


I'm not shure I understand what you mean.

- The first bits in -all- directions inside the board, from the square, are set by definition (it's an attack table).
- I'm making a perfect hash (not minimal), that means that all collisions will have to be 'good', which means exact attack pattern matches for all relevant piece patterns.

One reason for me to believe that it's possible to make a 2048 rook table, is what I found for knights. If you have a knight at E4, it's attack table hits 8 squares. If you want to count the number of knight moves you will get a pattern that have from 0-8 bits set. Then, the mobility of the knight could be found with PopCount.

So, I made a hash table that returns the PopCount of all possible knight patterns. Since there are 8 bits that can change independently, you get a 8-bit address, suggesting the size
KnightMobility[64][256]
that returns a value in the range [0,8]=9 different possibilities.

What I got keys for was
KnightMobility[64][64]
and that means there have to be a -lot- of 'good' collisions. Compared to rooks:

Knights: 256 addresses to return 9 unique values perfectly : Ideal with no collisions
I Found: 64 addresses to return 9 unique values perfectly : Practical with 'good' collisions
Rooks: 4096 addresses to return 49 unique patterns perfectly : Ideal
I Want: 2048 addresses to return 49 unique patterns perfectly

The number 49 comes from the 7*7 possible rook attack patterns at square A1,A8,H1,H8. All other squares have 2048 (borders) or 1024 addresses.

Lasse
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Grant Osborne » 22 Jun 2006, 15:41

Lasse

Maybe I am mising the point completely but to explain my last post.
On square A1 you have to 'see' A2-A7 and B1-G1 = 12 bits, giving 4096 different addresses. Your perfect hash maps the masked board position into one of the RookAttacks[A1][4096]. However, you want 2048 addresses (or less) to return 49 unique rook patterns perfectly, therefore you must increase your 'good' collisions. To do this you would need to check the 'bad' collisions when RookAttacks[sqr][Address] != -1. For example, rook on A1, piece on B1, piece on A2, any other piece in the rank or file does not matter therefore more than one attack pattern with B1 and A2 set can 'collide' with the bitboard held in RookAttacks[A1][Address] if bits B1 and A2 are set here also. If my thinking is correct it may be possible that supermagics exist for RookAttacks[64][64].
The knight is slightly different as it is not a sliding piece.

Grant
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

Re: Fast(er) bitboard move generator

Postby Gerd Isenberg » 22 Jun 2006, 19:31

Grant Osborne wrote:Lasse

Maybe I am mising the point completely but to explain my last post.
On square A1 you have to 'see' A2-A7 and B1-G1 = 12 bits, giving 4096 different addresses. Your perfect hash maps the masked board position into one of the RookAttacks[A1][4096]. However, you want 2048 addresses (or less) to return 49 unique rook patterns perfectly, therefore you must increase your 'good' collisions. To do this you would need to check the 'bad' collisions when RookAttacks[sqr][Address] != -1. For example, rook on A1, piece on B1, piece on A2, any other piece in the rank or file does not matter therefore more than one attack pattern with B1 and A2 set can 'collide' with the bitboard held in RookAttacks[A1][Address] if bits B1 and A2 are set here also. If my thinking is correct it may be possible that supermagics exist for RookAttacks[64][64].
The knight is slightly different as it is not a sliding piece.

Grant

If that is true, we have a very cheap leading or trailing zero count without masking single isolated most or least significant one bits.

Assume we have no problems with 'good' collisions and only a subset of occupied pattern - only max one bit per ray from a2_a7 and b1_g1, thus exactly 49 pattern for the 49 unique attacks of rook on a1. Can you provide a magic factor to get unique upper six bits? That would be the first step - then we may look for 'good' collisions ;-)

Gerd
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 23 Jun 2006, 00:08

Grant Osborne wrote:Lasse
To do this you would need to check the 'bad' collisions when RookAttacks[sqr][Address] != -1. For example, rook on A1, piece on B1, piece on A2, any other piece in the rank or file does not matter therefore more than one attack pattern with B1 and A2 set can 'collide' with the bitboard held in RookAttacks[A1][Address] if bits B1 and A2 are set here also.

I already check for 'bad' collisions. I only accept a collisions if, and only if, the wanted pattern match exactly.

The knight is slightly different as it is not a sliding piece.

Yes, the knight behavior is different from sliding pieces, but, the 'ehrmm' magic of hashing is the same, you have one complete set of patterns that make a set of addresses, and by looking up those addresses you get an answer (wanted result), that you are satisfied with (exact or not).

Lasse
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Edsel Apostol » 27 Jun 2006, 03:15

Hi Lasse,

Is it possible just to use this:
Code: Select all
Pattern = AllWPieces | AllBPieces;

instead of this?
Code: Select all
Pattern = (AllWPieces | AllBPieces) & MaskRook[From];



Hi Gerd,

Have you compared the speed of your own minimalist bitboard move generator against that of Lasse? How do you initialize this arrays?
Code: Select all
extern u8 CACHE_ALIGN firstRankAttacks[64][8];
extern u64 CACHE_ALIGN diagonalMask[64];
extern u64 CACHE_ALIGN antiDiagonalMask[64];

I have found this on this link:
http://www.talkchess.com/forum/viewtopi ... ght=#17161

Edsel
User avatar
Edsel Apostol
 
Posts: 73
Joined: 01 Aug 2005, 05:27
Location: Antique, Philippines

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 27 Jun 2006, 08:46

Edsel Apostol wrote:Is it possible just to use this:
Code:

Pattern = AllWPieces | AllBPieces;

instead of this?
Code:

Pattern = (AllWPieces | AllBPieces) & MaskRook[From];

No, I dont't think that is possible with this approach (imul+shift) (maybe it is teoretically). You would have 63 bits (9x10^18 possibilities) that make an address with some key, and you want some 2000+ patterns to be looked up. But I guess there are other ways to use hashing than what I use (single mask + multiplication + shift -> lookup).

About my move generator speed: I have now reached 68.5MNps quasi-legal moves and 18.5MNps legal moves for depth 5 in position "r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w QKqk -\n" using perft. (One more improvement would be to delete Zobrist key stuff, but I wont do that).

Lasse
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Edsel Apostol » 27 Jun 2006, 09:20

How about this? Is this possible?
Code: Select all
AttackRook = (AllWpieces|AllBpieces)  * RookMagicTable[64];

If this is possible, the move generation would almost cost nothing.
User avatar
Edsel Apostol
 
Posts: 73
Joined: 01 Aug 2005, 05:27
Location: Antique, Philippines

Re: Fast(er) bitboard move generator

Postby Gerd Isenberg » 27 Jun 2006, 10:13

Edsel Apostol wrote:Hi Lasse,
Hi Gerd,

Have you compared the speed of your own minimalist bitboard move generator against that of Lasse? How do you initialize this arrays?
Code: Select all
extern u8 CACHE_ALIGN firstRankAttacks[64][8];
extern u64 CACHE_ALIGN diagonalMask[64];
extern u64 CACHE_ALIGN antiDiagonalMask[64];



Hi Edsel,
since my current focus is on attack generation with Kogge-Stone-Fillstuff, i don't know and i don't tried Lasse's approach yet. It depends on so many "chaotical" things inside your program, on current and future hardware, cache- and tlb-issues etc.. Considering the memory- and cache- "footprint" of a program, each programmer with a rotated like bitboard framework may try the zillions of lookup-variations aka memory versus computation, to find what is actually faster - without changing the public interface of the attack-getters.

For initialization - one may use Kogge-Stone-routines to initialize the 512 byte of firstRankAttacks. A const initialization is a bit error-prone, but still possible with 64*8 byte according to your bitset-square mapping.

Regards,
Gerd
Gerd Isenberg
 
Posts: 285
Joined: 31 Jan 2005, 20:31
Location: Hattingen, Germany

Re: Fast(er) bitboard move generator

Postby Grant Osborne » 27 Jun 2006, 22:01

Lasse

I now believe that RookAttacks[64][2048] is possible. I have just found the key for H1 0xEA87FFFECBFEA5AE :D

Grant
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

Re: Fast(er) bitboard move generator

Postby Lasse Hansen » 28 Jun 2006, 00:01

Grant Osborne wrote:Lasse

I now believe that RookAttacks[64][2048] is possible. I have just found the key for H1 0xEA87FFFECBFEA5AE :D

Grant

Thanks!
For me it worked for A1 though, not H1, but those squares share the same problem. For me I now miss 2048 keys for H8, H1,A8,A7,A6,A5,A4,A3. It's nice to see that the pattern for the key you found is very dense, compared to the other rook attack keys :)

Lasse
Lasse Hansen
 
Posts: 41
Joined: 07 Jun 2006, 21:25
Location: Porsgrunn, Norway

Re: Fast(er) bitboard move generator

Postby Grant Osborne » 29 Jun 2006, 20:14

Lasse

Found:
Key for rooks 2048 square A1 for me (maybe A8 for you) A9FFFEFF4DFF63D6

Square B1 for me (A7 for you) 67FFFF5CFF65D94E

Grant 8-)
User avatar
Grant Osborne
 
Posts: 69
Joined: 16 Jun 2006, 14:05

PreviousNext

Return to Programming and Technical Discussions

Who is online

Users browsing this forum: Baidu [Spider], Google [Bot] and 2 guests