Moderator: Andres Valverde
mine:
time1 in sec = 14.400 runs 7624512*64 ~ 2.951 ns
MINIMIZE_MAGIC:
time2 in sec = 14.190 runs 7624512*64 ~ 2.908 ns
PERFECT_HASH:
time1 in sec = 14.440 runs 7624512*64 ~ 2.959 ns
neither:
time2 in sec = 18.800 runs 7624512*64 ~ 3.853 nsbob wrote:Some data from Crafty...
I have been playing around with this stuff since you, Gerd, etc started discussing it. I have taken bits and pieces posted here, and slowly found the time to migrate this approach into Crafty. I completely removed the rotated bitboard stuff, including the updates to the rotated boards in make/unmake, to make this a "real magic implementation."
Here's the results:
log.001: time=1:48 mat=0 n=93585150 fh=90% nps=866K
log.002: time=1:47 mat=0 n=93585150 fh=90% nps=869K
The second line is the magic version. Which appears to be about 1% faster overall, not very significant. I will add that I did not try any of the "smaller" implementations, this is the full-sized magic table approach with 64 x 512 entries for bishops and 64 x 4096 entries for rooks...
As I had thought, the extra size of the table seems to hurt my 32 bit version (running on my usual xeon). I then tested on our 64 bit xeon boxes to see what happens when the 64 bit multiply becomes less expensive. Here's the result there:
log.001: time=1:19 mat=0 n=93585150 fh=90% nps=1.2M
log.002: time=1:19 mat=0 n=93585150 fh=90% nps=1.2M
So on the 64 bit processors, the advantage I thought might show up is not there at all and there appears to be no speed advantage whatsoever.
I have not tried the reduced-table approaches although I guess they would be easy enough to merge in. Does anyone have any data suggesting that the reduced-array versions performs any better? Otherwise the advantage over rotated bitboards seems to be pretty well zero...
bob wrote:Otherwise the advantage over rotated bitboards seems to be pretty well zero...
Pradu wrote:bob wrote:Otherwise the advantage over rotated bitboards seems to be pretty well zero...
There is one advantage, even if it isn't in speed. Magic move bitboard generators are independant of the position and this allows you to do some really neat tricks such as creating xray-attacks and finding pins in all directions simultaneously.
return (
(GameBoard.p[side][pawn] & MoveArray[ptype[side^1]][tsq]) |
(Knight[tsq] & GameBoard.p[side][knight]) |
(Bmagic(tsq^63, GameBoard.Occupied) & (GameBoard.p[side][bishop]|GameBoard.p[side][queen])) |
(Rmagic(tsq^63, GameBoard.Occupied) & (GameBoard.p[side][rook]|GameBoard.p[side][queen])) |
(King[tsq] & GameBoard.p[side][king])
);
smcracraft wrote:Pradu wrote:bob wrote:Otherwise the advantage over rotated bitboards seems to be pretty well zero...
There is one advantage, even if it isn't in speed. Magic move bitboard generators are independant of the position and this allows you to do some really neat tricks such as creating xray-attacks and finding pins in all directions simultaneously.
How would I do that with this?
Stuart
attacks=Rmoves(square,occupancy);
xrayattacks=Rmoves(square,(occupancy&~attacks)) & ~attacksattacks=Bmoves(square,occupancy);
batteryattacksB=Bmoves(square,(occupancy&~Queens[myside])) & ~attacksFinding diagonal pinners
//Diagonal pins
possiblePinned=Bmoves(pos->KingPos[side],pos->AllPieces)&pos->PiecesSide[side];
pinners=Bmoves(pos->KingPos[side],pos->AllPieces^possiblePinned)
& ((pos->Pieces[B]|pos->Pieces[Q])&pos->PiecesSide[!side]);Pradu wrote:bob wrote:Otherwise the advantage over rotated bitboards seems to be pretty well zero...
There is one advantage, even if it isn't in speed. Magic move bitboard generators are independant of the position and this allows you to do some really neat tricks such as creating xray-attacks and finding pins in all directions simultaneously.
Pradu wrote:smcracraft wrote:Pradu wrote:bob wrote:Otherwise the advantage over rotated bitboards seems to be pretty well zero...
There is one advantage, even if it isn't in speed. Magic move bitboard generators are independant of the position and this allows you to do some really neat tricks such as creating xray-attacks and finding pins in all directions simultaneously.
How would I do that with this?
Stuart
- Code: Select all
attacks=Rmoves(square,occupancy);
xrayattacks=Rmoves(square,(occupancy&~attacks)) & ~attacks
- Code: Select all
attacks=Bmoves(square,occupancy);
batteryattacksB=Bmoves(square,(occupancy&~Queens[myside])) & ~attacks
- Code: Select all
Finding diagonal pinners
//Diagonal pins
possiblePinned=Bmoves(pos->KingPos[side],pos->AllPieces)&pos->PiecesSide[side];
pinners=Bmoves(pos->KingPos[side],pos->AllPieces^possiblePinned)
& ((pos->Pieces[B]|pos->Pieces[Q])&pos->PiecesSide[!side]);
bob wrote:I'm not sure I follow. I've always done x-ray attacks using rotated bitboards with no effort. Ditto for pins, where I need that in GenerateCheckEvasions()... And even doing all that, I got a zero speed increase...
possiblePinned=Bmoves(pos->KingPos[side],pos->AllPieces)&pos->PiecesSide[side];
pinners=Bmoves(pos->KingPos[side],pos->AllPieces^possiblePinned)
& ((pos->Pieces[B]|pos->Pieces[Q])&pos->PiecesSide[!side]);//Battery attacks including regular attacks
batteryattacksB=Bmoves(square,occupancy&~diagonalSliders[myside])Pradu wrote:bob wrote:I'm not sure I follow. I've always done x-ray attacks using rotated bitboards with no effort. Ditto for pins, where I need that in GenerateCheckEvasions()... And even doing all that, I got a zero speed increase...
Well it is atleast as fast as rotated according to your tests. Can you post a quick profile of Crafty with magics here? It could be that generating move bitboards isn't done much timewise in Crafty. As for comparing versitality of rotated and magic approaches:
Let me guess that you have extra rotated xray-databases (RmovesXray[64][64], BmovesXray[64][64]) to do fast xrays? Even if one can do xrays with rotated, it isn't versitle. One can probably get away with it for checking pins but for anything complex it won't work because you can't xray selectively.
Some things you can do faster with magics are like finding pinners to your pieces (not opponents pieces as well).
- Code: Select all
possiblePinned=Bmoves(pos->KingPos[side],pos->AllPieces)&pos->PiecesSide[side];
pinners=Bmoves(pos->KingPos[side],pos->AllPieces^possiblePinned)
& ((pos->Pieces[B]|pos->Pieces[Q])&pos->PiecesSide[!side]);
Take the example of battery attacks which you can't do with rotated at all.
- Code: Select all
//Battery attacks including regular attacks
batteryattacksB=Bmoves(square,occupancy&~diagonalSliders[myside])
Tell me how one would do battery with rotated bitboards without any expensive LastOne/FirstOne calls and without 6 additional rotated bitboards for bishops+queens and rooks+queens. (You'll need to add 3 extra rotated bitboards for every case to generate selective xray attacks).
Or how about multiple xrays which might be required in SEE?
The thing is, one can't avoid FirstOne/LastOne calls completely to do some basic things with rotated bitboards...unless you can show me otherwise.
In any case, anything you can do with rotated, you can do with magics -- and then some.
But the thing is, you don't need to do this, because you don't need to know the actual square of the attacker. You know the piece type, so you can just do an x ^= x & -x. Actually it is a little more complicated because you don't necessarily strip the LSB: y = x & piece_bb; x ^= y & -y. But this is still going to be faster than an LSB.bob wrote:At least in my SEE stuff I don't need any MSB/LSB stuff to find batteries at all... I do use MSB/LSB to pick off each individual piece as it participates in the capture "party"...
I agree, and this is one of the reasons I don't use magic. However anti-rotated is the best of both worlds, easily modified occupied sets and separated directions.There are complications. Sometimes I only want a specific diagonal. With rotated I can look that up. With magic I have to compute both and then remove one, which is not so easy with diagonals. I also use rank attacks or file attacks for things like asking if two rooks are connected on the 7th... Again I have to and off 1/2 the data the magic stuff gives me...
That's rather strange. I've never seen results where the large tables are faster. Perhaps you might want to run the tests multiple times, as there is some variation, at least in the 32 bit (1:45 and 1:47 for non-magic).Here's my latest testing going back and modifying things to use your smaller tables:
32bit:
log.001: time=1:45 mat=0 n=93585150 fh=90% nps=883K
log.002: time=1:47 mat=0 n=93585150 fh=90% nps=869K
so there magic is actually slower, probably those 64 bit multiples that are killers on a 32 bit machine...
64bit:
log.001: time=1:19 mat=0 n=93585150 fh=90% nps=1.2M
log.002: time=1:19 mat=0 n=93585150 fh=90% nps=1.2M
No difference there at all. Looks like the 2mb version is actually faster in 32 bit mode and about equal on the 64 bit xeons we have. I'll try to run some opteron tests later...
Pradu wrote:smcracraft wrote:Pradu wrote:bob wrote:Otherwise the advantage over rotated bitboards seems to be pretty well zero...
There is one advantage, even if it isn't in speed. Magic move bitboard generators are independant of the position and this allows you to do some really neat tricks such as creating xray-attacks and finding pins in all directions simultaneously.
How would I do that with this?
Stuart
- Code: Select all
attacks=Rmoves(square,occupancy);
xrayattacks=Rmoves(square,(occupancy&~attacks)) & ~attacks
- Code: Select all
attacks=Bmoves(square,occupancy);
batteryattacksB=Bmoves(square,(occupancy&~Queens[myside])) & ~attacks
- Code: Select all
Finding diagonal pinners
//Diagonal pins
possiblePinned=Bmoves(pos->KingPos[side],pos->AllPieces)&pos->PiecesSide[side];
pinners=Bmoves(pos->KingPos[side],pos->AllPieces^possiblePinned)
& ((pos->Pieces[B]|pos->Pieces[Q])&pos->PiecesSide[!side]);
ShowXrayBatteryPins(void)
{
int square, side;
BitBoard attacks, xrayattacks,batteryattacksB,pinners,possiblePinned, x;
for (side = white; side <= black; side++)
{
x = GameBoard.AllPieces[side];
while (x)
{
square = NEXTBIT(x);
CLEARBIT(x,square);
if (square == c6 || square == c3 || square == f3 || square == f6)
{
printf("woops\n");
}
attacks=Rmagic(square^63,GameBoard.Occupied);
xrayattacks=Rmagic(square^63,(GameBoard.Occupied&~attacks)) & ~attacks;
attacks=Bmagic(square^63,GameBoard.Occupied);
batteryattacksB=Bmagic(square^63,(GameBoard.Occupied&~GameBoard.p[side][queen])) & ~attacks;
possiblePinned=Bmagic(GameBoard.kingloc[side],GameBoard.AllPieces[side]);
pinners=Bmagic(GameBoard.kingloc[side],GameBoard.Occupied^possiblePinned)
& ((GameBoard.p[side][bishop]|GameBoard.p[side][queen]|
GameBoard.p[side^1][bishop]|GameBoard.p[side^1][queen])&GameBoard.AllPieces[side^1]);
if (xrayattacks)
{
printf("xrayattacks:\n");
ShowBitBoard(&xrayattacks);
}
if (batteryattacksB)
{
printf("batteryattacksB:\n");
ShowBitBoard(&batteryattacksB);
}
if (pinners)
{
printf("pinners:\n");
ShowBitBoard(&pinners);
}
}
}
}
Zach Wegner wrote:But the thing is, you don't need to do this, because you don't need to know the actual square of the attacker. You know the piece type, so you can just do an x ^= x & -x. Actually it is a little more complicated because you don't necessarily strip the LSB: y = x & piece_bb; x ^= y & -y. But this is still going to be faster than an LSB.bob wrote:At least in my SEE stuff I don't need any MSB/LSB stuff to find batteries at all... I do use MSB/LSB to pick off each individual piece as it participates in the capture "party"...
I agree, and this is one of the reasons I don't use magic. However anti-rotated is the best of both worlds, easily modified occupied sets and separated directions.There are complications. Sometimes I only want a specific diagonal. With rotated I can look that up. With magic I have to compute both and then remove one, which is not so easy with diagonals. I also use rank attacks or file attacks for things like asking if two rooks are connected on the 7th... Again I have to and off 1/2 the data the magic stuff gives me...That's rather strange. I've never seen results where the large tables are faster. Perhaps you might want to run the tests multiple times, as there is some variation, at least in the 32 bit (1:45 and 1:47 for non-magic).Here's my latest testing going back and modifying things to use your smaller tables:
32bit:
log.001: time=1:45 mat=0 n=93585150 fh=90% nps=883K
log.002: time=1:47 mat=0 n=93585150 fh=90% nps=869K
so there magic is actually slower, probably those 64 bit multiples that are killers on a 32 bit machine...
64bit:
log.001: time=1:19 mat=0 n=93585150 fh=90% nps=1.2M
log.002: time=1:19 mat=0 n=93585150 fh=90% nps=1.2M
No difference there at all. Looks like the 2mb version is actually faster in 32 bit mode and about equal on the 64 bit xeons we have. I'll try to run some opteron tests later...
void
/*
* Show x-ray, battery, and pins
*/
ShowXrayBatteryPins(void)
{
int square, side;
BitBoard attacks, xrayattacks,batteryattacksB,pinners,possiblePinned, x;
for (side = white; side <= black; side++)
{
xrayattacks = batteryattacksB = pinners = NULLBITBOARD;
x = GameBoard.p[side][bishop]|GameBoard.p[side][rook];
while (x)
{
square = NEXTBIT(x);
CLEARBIT(x,square);
if (BitPosArray[square] & GameBoard.p[side][bishop])
{
attacks=Bmagic(square^63,GameBoard.Occupied);
batteryattacksB=Bmagic(square^63,(GameBoard.Occupied&~GameBoard.p[side][queen])) & ~attacks;
}
else if (BitPosArray[square] & GameBoard.p[side][rook])
{
attacks=Rmagic(square^63,GameBoard.Occupied);
xrayattacks=Rmagic(square^63,(GameBoard.Occupied&~attacks)) & ~attacks;
}
if (xrayattacks)
{
printf("xrayattacks:\n");
ShowBitBoard(&xrayattacks);
}
if (batteryattacksB)
{
printf("batteryattacksB:\n");
ShowBitBoard(&batteryattacksB);
}
}
possiblePinned=Bmagic(GameBoard.kingloc[side]^63,GameBoard.AllPieces[side]);
pinners=Bmagic(GameBoard.kingloc[side]^63,GameBoard.Occupied^possiblePinned)
& ((GameBoard.p[side][bishop]|GameBoard.p[side][queen]|
GameBoard.p[side^1][bishop]|GameBoard.p[side^1][queen])&GameBoard.AllPieces[side^1]);
if (pinners)
{
printf("pinners:\n");
ShowBitBoard(&pinners);
}
}
}
I don't follow. Looking at the Crafty Swap() function, you can replace square with a bitboard that is set to WhitePawns & attacks, for example, and set attacked_piece to pawn or whichever piece you find. Then later you can find the LSB of this bitboard and clear it from attacks. You keep a bitboard with these LSBs ored into it, to represent what pieces have been used. The SwapXray function then uses this bitboard to calculate new attackers: Bmagic(square, Occupied & ~used) & (BishopsQueens & ~used), and likewise for rook directions (not sure if these names are the same as in Crafty). You can pass in the piece type to find which directions need to be recalculated (pawns, bishops, queens=diagonals; rooks, queens=orthogonals).bob wrote:Actually you do. Take the case of a bishop with a queen behind it and on another diagonal you have something like a pawn with a bishop behind it. As I use a piece in the Swap() calculation, I need to replace that piece with the piece that is behind it. Therefore I need to know _specifically_ which piece is in back. Knowing that there are lots of pieces in battery with other attackers is not important right now, I just need to know what is behind the piece I just used to make a capture...
I did run it multiple times with multiple positions. It caught my eye too... Remember my test is not just move generation, it is a complete search, where those 64 bit multiplies might get pipelined OK, but the extra memory reference can be a killer since cache is getting romped on pretty hard by everything and an extra L2 miss can be expensive... And with random accessing into the tables, fetching 128 bytes at a chunk (my L2) that extra reference might hurt more than expected.
bob wrote:There are complications. Sometimes I only want a specific diagonal. With rotated I can look that up. With magic I have to compute both and then remove one, which is not so easy with diagonals. I also use rank attacks or file attacks for things like asking if two rooks are connected on the 7th... Again I have to and off 1/2 the data the magic stuff gives me...
Here's my latest testing going back and modifying things to use your smaller tables:
32bit:
log.001: time=1:45 mat=0 n=93585150 fh=90% nps=883K
log.002: time=1:47 mat=0 n=93585150 fh=90% nps=869K
so there magic is actually slower, probably those 64 bit multiples that are killers on a 32 bit machine...
64bit:
log.001: time=1:19 mat=0 n=93585150 fh=90% nps=1.2M
log.002: time=1:19 mat=0 n=93585150 fh=90% nps=1.2M
No difference there at all. Looks like the 2mb version is actually faster in 32 bit mode and about equal on the 64 bit xeons we have. I'll try to run some opteron tests later...
+- -+ +- -+ +- -+
| h 0 0 0 0 0 0 0 | | 0 0 0 0 0 0 0 0 | | 0 0 b c d e f g |
| g 0 0 0 0 0 0 0 | | 0 0 1 0 0 0 0 0 | | 0 0 a b c d e f |
| f 0 0 0 0 0 0 0 | | 0 0 0 1 0 0 0 0 | | 0 0 0 a b c d e |
| e 0 0 0 0 0 0 0 | * | 0 0 0 0 1 0 0 0 | = | 0 0 0 0 a b c d |
| d 0 0 0 0 0 0 0 | | 0 0 0 0 0 1 0 0 | | 0 0 0 0 0 a b c |
| c 0 0 0 0 0 0 0 | | 0 0 0 0 0 0 1 0 | | 0 0 0 0 0 0 a b |
| b 0 0 0 0 0 0 0 | | 0 0 0 0 0 0 0 1 | | 0 0 0 0 0 0 0 a |
| a 0 0 0 0 0 0 0 | | 0 0 0 0 0 0 0 0 | | 0 0 0 0 0 0 0 0 |
+- -+ +- -+ +- -+
+- -+ +- -+ +- -+
| 0 1 0 0 0 0 0 0 | | 0 0 0 0 0 0 0 0 | | 0 0 b c d e f g |
| 0 1 0 0 0 0 0 0 | | 0 b 0 0 0 0 0 0 | | 0 0 b c d e f g |
| 0 1 0 0 0 0 0 0 | | 0 0 c 0 0 0 0 0 | | 0 0 0 c d e f g |
| 0 1 0 0 0 0 0 0 | * | 0 0 0 d 0 0 0 0 | = | 0 0 0 0 d e f g |
| 0 1 0 0 0 0 0 0 | | 0 0 0 0 e 0 0 0 | | 0 0 0 0 0 e f g |
| 0 1 0 0 0 0 0 0 | | 0 0 0 0 0 f 0 0 | | 0 0 0 0 0 0 f g |
| 0 1 0 0 0 0 0 0 | | 0 0 0 0 0 0 g 0 | | 0 0 0 0 0 0 0 g |
| 0 1 0 0 0 0 0 0 | | 0 0 0 0 0 0 0 0 | | 0 0 0 0 0 0 0 0 |
+- -+ +- -+ +- -+
Pradu wrote:I did run it multiple times with multiple positions. It caught my eye too... Remember my test is not just move generation, it is a complete search, where those 64 bit multiplies might get pipelined OK, but the extra memory reference can be a killer since cache is getting romped on pretty hard by everything and an extra L2 miss can be expensive... And with random accessing into the tables, fetching 128 bytes at a chunk (my L2) that extra reference might hurt more than expected.
You are using MINIMIZE_MAGIC right? It only makes 1 L2 reference but the PERFECT_HASH thing uses 2 L2 references. I think most of us determined that PERFECT_HASH was a slower method, try MINIMIZE_MAGIC instead.
bob wrote:#define AttacksRook(square) *(magic_rook_indices[square]+((((tree->pos.w_occupied|tree->pos.b_occupied)&magic_rook_mask[square])*magic_rook[square])>>magic_rook_shift[square]))
Is that the version you are thinking about???
Return to Programming and Technical Discussions
Users browsing this forum: No registered users and 4 guests