Saturday, February 21, 2009

Details about the way it works

Here are some detailled information about the realisation of this project. The program has been developped using Visual C++ Express, and I can share source code with anyone interested.

Description of a player. 

Every player consists of :
  • A 19x19 array of vector of input neural adresses, telling where each value of each case on the board goes. Every case goes to at least one neural input.
  • A 19x19 array of output neural adress, telling from where information should be retrived do decide the move of the player.
  • A Neural network processing the input information and providing a value for each case of the goban. The higher value decides the chosen move.

Description of the neural network.

The neural network consists in a map of genes ; each gene is a map of neurones. Each neurone is conceived as a classical artificial neuron and is defined by :
  • It's type : linear, sigmoid, fully linear or piecwise linear.
  • Two parameters defining its transfer function : threshold and global weight.
  • A vector of output adresses telling where the inputs are connected. 
  • An associated vector of weights defining the weight of each input.
  • A vector of input adresses telling where the output of the neuron goes.
  • A mutation probability for every one of these values, ranging from 1/10 to 1/32000.
  • A species tag.
  • An historical log telling where the player's ascendants have been.
The Tournament

The tournament is organized as follow : each player 0-49 plays against a randomly chosen player 50-99, playing black twice and white twice. A win earns 3 points, a tie earns 1 point and a loss 0 points. By the end of the tournaments, players are sorted , players with as many points are sorted according to their file size.

The 50 highest ranked survive, and the other are killed.

Reproduction

Players who are in the top half of the population get the right to reproduce. Every one of them follow this procedure .
  • First, they try to mate with another winner : they search the population for another member with the same species tag. If found, the achive sexual reproduction twice. Sexual reproduction consists in swapping random genes from the two reproductors. Generally, the results is a non valid neural networks. When its the case, the reproducing player is chosen a new species tag and proceeds as if it had not found a suitable partner. 
  • If sexual reproduction failed, a simple cloning is achieved.
Mutation

After reproduction is achieved, every player suffers a random mutation :
  • Every neural properties can randomly mutate.
  • Every adresses can mutate.
  • Genes can split, merge, grow, shrink, duplicate or disappear.
  • Every mutation probability can mutate, according to a driving metamutation probability factor arbitrarilly set at 0.002 for the moment.
Journey

By the end of every tournament, some genetic mixing is assured with the other ecosystems through the transfer of players via a web hosted ftp server.

  • The hishest ranked in the tournament is sent to the ftp server, and assigned a new number ther from 0 to 29. 
  • One of the 30 players available on the ftp server is randomly chosen and replaces one of the 100 players in the local ecosystem.

7 comments:

  1. Drap ( :o )

    This is an interesting project, but I'm afraid the decision procedure you describe is way too simplistic (at least in the way you put it) to lead to interesting results. I tried doing something similar with a 'Puissance 4' a while ago, but it turned out that neural nets are simply not good at classifying goban-like data, they miss out the fine details and fail at generalizing properly, because the data to generalize from is not the goban itself, but the logic behind each move.

    Of course you could in theory approximate a good classification function using enough time and suitable parameters, but practically speaking this is a dead-end.

    Besides, what's with 'having neural nets learn using mutations' ? You have perfectly good ways of making neural nets learn without having to rely on slow, chance mutations.

    But hey, I would be glad if you proved me wrong. Good luck.

    ReplyDelete
  2. There is much evolution to be done before we reach a "rather good" go player indeed.

    Nonetheless, I want to point out that I did not implement Artificial Neural Network to do what they have been proved to be good at. I chosed this method because Natural neural network are the best today at go, they provide an elegant method to process information, thats all I want. I dont want to design (anyway, I can't) a clever way to make them learn as it is usually made with ANN.

    ReplyDelete
  3. I've thought about this kind of "naive" / brute force approach before but it's great to see an implementation that just works and people can try out.

    Can you confirm that the selection procedure ensures that each network plays the same number of games? Otherwise it would skew the scores and rankings.

    Also, can you confirm that the highest ranked network is never replaced by a random one from the server? While not fatal, this is probably something that should be avoided to ensure that the local population can't get worse as a result of the mixing.

    Finally, just wondering if you do something to stop infinite loops in the games? Is there a point where you simply stop and assign a draw? Do you check for superko? It would be nice to specify exactly which rules you are using.

    Oliver

    ReplyDelete
  4. - Every player plays twice as white and twice as black.
    - There is 1/100 chance that the winner is replaced by a newcomer. But this does not mean that he dies, because he already left to the ftp server, where he has a 1/30 chance of going back to an actual ecosystem for each tournament.
    - Your last interrogation are very pertinent, I think this is where there is the more room for improvement in my program. For the moment, my player play 300 moves and then pass, there is no super-ko check, and the scoring is very straightforward (see the opengo documentation : http://www.inventivity.com/OpenGo/).

    ReplyDelete
  5. I looked briefly at the opengo documentation and it does seem to handle superko. The games on my machine have slowed to a crawl, which suggests that they are too long.

    From what I read above, I assume you are doing a 1-ply search with the network providing an evaluation for each move you search. Are you excluding "probable eyes" from these searches? this is standard in MC playouts to prevent random or weak players from filling their own eyes, at which point large blocks die and the game takes much longer to finish. You could require the networks to learn about this themselves but (as suggested by Dave on the maiiling list) it's cheap and sensible to give them this as prior knowledge.

    ReplyDelete
  6. A thought about players on the FTP server: in theory these are supposed to be some of the best players around. In practice, if many people try this out a few times but don't persist, most of the uploads to the FTP server will consist of the "best" player from a very new population - therefore very weak. Can you impose some kind of quality control to avoid "premature" uploads?

    ReplyDelete
  7. What do you mean precisely by "a crawl" ? It is perfectly normal foir the first ten generations to be gradually slower, while the new random players that were created on your disk are quickly eliminated from it and replaced by better, and much heavier players from the ftp server. Each game takes about 1-3 s on my computers.

    I will maybe add a simple feature to prevent my players from playing in their own eyes, but that's the single feat they achieve right now :'(

    Concerning the pemature upload of bad players, its no big deal really, the whole ftp population is repopulated every 4-6 hr, and there is no chance a newly created players coming in an ancient player's pool could caus any damage to it (hey, if it were the case, that would mean they don't deserve to live anyway). This would indeed slow the whole process very little. That's not as if I were in a hurry anyway :)

    ReplyDelete