Get startedGet started for free

Informed Search: Genetic Algorithms

1. Informed Methods: Genetic Algorithms

Let's learn about another informed search methodology using genetic hyperparameter tuning.

2. A lesson on genetics

To understand genetic algorithms in machine learning, we should firstly understand how the inspiration for this technology, biological evolution, works. In genetic evolution in the real world, we have the following process. To begin, there are many creatures existing called offspring. The strongest creatures survive the tough environment and pair off. There is some 'crossover' as they form offspring. Random mutations occur with some of the offspring. These mutations sometimes help give some offspring an advantage. Finally, go back to (1)! This is how evolution works in nature.

3. Genetics in Machine Learning

We can apply the same idea to hyperparameter tuning. Firstly we can create some models (that have hyperparameter settings). Second, we can pick the best (by our scoring function). These are the ones that 'survive'. We can then create new models that are similar to the best ones. We add in some randomness so we don't reach a local optimum. Finally we continue this until we are happy!

4. Why does this work well?

This is an informed search that has a number of advantages. It allows us to learn from previous iterations, just like Bayesian hyperparameter tuning. It has the additional advantage of some randomness. This randomness is important because it means we won't just be working on finding similar models and going down a singular path. We have a chance to move to a completely different area of the hyperparameter search space which may be better. Finally, It takes care of many tedious aspects of machine learning like algorithm and hyperparameter choice.

5. Introducing TPOT

A useful library for genetic hyperparameter tuning is TPOT. The documentation on the website explains its vision. It aims to be your Data Science Assistant, automatically optimizing pipelines of models using genetic programming." This is great because pipelines not only include the model (or multiple models) but also work on features and other aspects of the process. Plus it returns the Python code of the pipeline for you!

6. TPOT components

The key arguments to a TPOT classifier are: generations: The number of cycles we undertake of creating offspring models, mutating and crossing over, picking the best and continuing. population_size: In each iteration, this is how many models we keep. The strongest 'offspring'. offspring_size: In each iteration this is the number of offspring (for us, models) that we create. mutation_rate: We apply randomness to a proportion of the pipelines. This hyperparameter sets that proportion (between 0 and 1). crossover_rate: In each iteration we crossover or breed together some of our models to find similar ones. This sets the proportion of pipelines that we do this to. scoring: the objective function to determine the strongest models or offspring for example, accuracy. And finally cv, the cross-validation strategy to use which will be familiar from classic machine learning modeling you have done.

7. A simple example

Here we have a super simple example. You will notice the similarities to they way you have built models with Scikit Learn. Firstly creating the estimator with the hyperparameters. Here the TPOTClassifier is the estimator. Then using the dot-fit method and a scoring method, just like in Scikit Learn. We will keep default values for mutation_rate and crossover_rate as they are best left to the default without deeper knowledge on genetic programming. The verbosity parameter will print out the process as it goes. Notice how we are not even selecting algorithms or hyperparameters? TPOT does it all!

8. Let's practice!

Lets practice building our own genetic hyperparameter optimizer with TPOT!