Using Darwinism to Train Software

Imagine the goal of training a computer program to play a platform game, like Super Mario, all on its own. This program will be allowed to see the screen, the blocks, the goombas – just as a human player could. And it can input commands – run, jump, and such – as if it had a controller in hand.

Imagine the goal of training a computer program to play a platform game, like Super Mario, all on its own. This program will be allowed to see the screen, the blocks, the goombas – just as a human player could. And it can input commands – run, jump, and such – as if it had a controller in hand.

But we can’t tell the program where to run and when to jump. This isn’t a player piano, and there’s no perforated paper roll. The solution to this puzzle is a program written to teach itself how to play the game.

That doesn’t seem possible

Thanks to efforts of the computer science community over the past few years, this is now feasible on even consumer-grade computer hardware. And the principle behind most solutions in use today is little more than the Darwinistic pattern of survival-of-the-fittest.

Consider again a level of Super Mario. Any particular game action – whether it’s jumping off a Goomba’s head or jumping over a hole in the ground – can be described by only a handful of numeric values. Where was Mario when the player hit ‘jump’? How fast was Mario running at the time? How tall was the pipe Mario needed to clear?

Note that data, and one has a record of that action. Record the data for every action throughout the level, and one ends up with a mathematical representation of the entire play through.

When humans play a video game, they (typically!) don’t think of it in terms of statistics. But a computer program can. Mathematics is its language, and all a software program will need to ultimately make it through a level is to discover a series of values that describe player actions that make it through to the end.

Monkeys at typewriters

Any particular level of Super Mario might consist of hundreds of player actions from start to finish. Each of those actions would be comprised of several metrics (direction, speed, and location on screen, for example). That’s a lot of data. For the sake of illustrating the principle of machine learning, let’s assume a level that can be navigated in just sixteen  moves.

All the learning software will do is play the level over and over again, hundreds or thousands of times – until it makes it through. If the program were to do so completely at random, eventually it would succeed. But it would take a long time – even with only sixteen moves to overcome.

With a real level requiring hundreds of moves, blind chance becomes unwieldy.

Here’s where biology takes over

In Darwinistic theory, life is a continuum of organisms that survive. Their ability to survive and reproduce is the aggregate of their genetic traits. Traits that promote survival are passed to the next generation; other traits are phased out by their more adapted counter-parts.

Our program will do the same – as it plays, the software will keep track of what actions it takes and measure how far it progresses through the level.

But how does the program decide what actions to take in the first place? It’s completely random. Well – the absolute first run-through is. But successful run-throughs only change one or two actions ever so slightly. It’s the same pattern that’s observed in random mutations within genetic data. Life ‘discovers’ more advantageous survival traits. And through minor, random changes in the successful level runs, the software can tweak those here and there to find a new run that succeeds even more.

Repeat for a few thousand generations.

With patience and diligence, a program of this design will eventually converge random values into a collection that can successfully navigate the level in its entirety.

But is this really evolution?

The strategy outlined above follows a survival-based selection process as observed in nature, but the comparison breaks down at a few key aspects:

1. Biological evolution is non-observant

In machine learning, an agent is monitoring and actively engaged in the selection of optimal results. Darwinism, on the other hand, acts perfectly well independent of any management.

2. Biological evolution does not converge to an end

As far as we know. Horses and zebras are just getting more different, folks.

3. Biological evolution is (typically) not unilateral

In the example above, only the player’s actions change over time. Everything else stays the same. To better simulate in vivo evolution, one could introduce a predator (i.e., enable the level to adapt to the player’s increasing competency by incrementing difficulty). Naturally – several tinkerers have already pitted one machine-learned software against another.

This is just scratching the surface

The above scenario is largely illustrative, but a practical example of this learning pattern has been built by Ivan Seidel. Engineering-inclined folks might enjoy a look at the source code to see the specific mechanisms employed.

Liked this blog? Then check out our others!

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: