Week 8: Gradient-free optimisation

In the following exercises you will need to develop some (simple) code to implement gradient-free optimisation algorithms. You are free to use any resources available to develop your codes, but I recommend trying to understand the details of how they work to build up intuition for later in the course.

The purpose of all these exercises is to help you understand the strengths and weakness of different approaches. In optimisation, there is no “one size fits all”; different problems will be best solved using different methods.

Almost all optimisation packages will implement Nelder-Mead as a simple, low-dimensional, gradient-free optimisation algorithm. However, population-based methods are less commonly found in existing packages because they require much more tuning towards the specific problem of interest.

Supplementary material

Algorithms for Optimisation - Direct Methods (Nelder-Mead)
Algorithms for Optimisation - Stochastic Methods (Simulated Annealing)
Algorithms for Optimisation - Population Methods (Genetic Algorithms and Particle Swarm Optimisation)
Various optimisation algorithm implementations
- Python - scipy.optimize
- Julia - Optim.jl
- MATLAB - Optimization Toolbox
Functional programming in NumPy - specifically the apply_along_axis and apply_over_axes functions

Notes from the demo session

Notation

Throughout this unit I use superscripts in brackets to denote iteration number. Superscripts without brackets denote to the power of. I.e., \(x^{(3)}\) is the value of \(x\) on the third iteration, whereas \(x^3\) is the third power of \(x\).

Exercise 1

Ackley’s function is a commonly used optimisation test function with many local minima. It is given by \[ f(\mathbf{x}) = -a\exp\left(-b\sqrt{\frac{1}{d}\sum_{i=1}^d x_i^2}\right) - \exp\left(\frac{1}{d}\sum_{i=1}^d \cos(c x_i)\right) + a + \exp(1). \] Typically, \(a=20\), \(b=0.2\), \(c=2\pi\). Code for Ackley’s function is below.

Visualise Ackley’s function in two dimensions for \(-30\le x\le 30\). What do you notice about this function? (Also see different ways to generate the objective function landscape.)
Implement Particle Swarm Optimisation on Ackley’s function in 2D using a population of 10 randomly placed particles. Visualise the step-by-step movement of the particles (e.g., draw lines between the each point each particle visits to show the time history).
Change the hyperparameters of the PSO algorithm (e.g., momentum, number of particles, etc). How is the convergence of the method affected versus computational time?
Increase the number of dimensions to 10. How does the performance of the algorithm change?
[Extension] Break the overall population into sub-swarms, so that you now track a global minimum, a sub-swarm minimum, and a personal minimum, with corresponding forces on each particle in those three directions. How does this change the behaviour of the algorithm?

Python code for Ackley’s function

def ackley(x, a=20, b=0.2, c=2*np.pi):
    d = len(x)
    return -a * np.exp(-b * np.sqrt(np.sum(np.square(x))/d)) - \
        np.exp(np.sum(np.cos(c * x))/d) + a + np.exp(1)

Julia code for Ackley’s function

function ackley(x, a=20, b=0.2, c=2π)
    d = length(x)
    return -a * exp(-b * sqrt(sum(x.^2)/d)) - 
        exp(sum(cos.(c .* x))/d) + a + exp(1)
end

Exercise 2

The Travelling Salesman Problem is a combinatorial problem where you need to choose the optimal ordering of cities to visit (and sell your goods in) such that you never visit the same city twice and you minimise the total distance travelled.

We can find good (near optimal) solutions using a genetic algorithm.

The first question is how to represent a journey. The easiest way is to use a vector of integers, where each integer represents a city. (E.g., 0 = London, 1 = Birmingham, etc.) The objective function is then the sum of the Euclidean distances between each city, in the order given by the solution vector.

The constraints on the problem will be satisfied if each city is only in the vector once.

Assuming that the start and end points can be different (i.e., you don’t need to return to your starting position), for a list of 20 cities there are \(20! = 2,432,902,008,176,640,000\) possible combinations. Exhaustive search is not feasible.

What might a suitable crossover function be? (I.e., combining two parents to generate a child vector.) Remember that the constraints must still be satisfied.
What might a suitable mutation function be? (I.e., modifying a child vector to give a slightly different vector.)
Write code for each element of the genetic algorithm
1. Generate the initial population.
2. Calculate the fitness of each solution.
3. Select solutions to breed from.
4. Apply the crossover function to each pair of solutions.
5. Apply the mutation function to ecah child solution.
6. Replace the population with the child population.
7. Go to 2 until the termination criteria have been met.
Run your code with the UK cities data provided below.

Test data

Obtained from ChatGPT with the prompt “Give me the coordinates of the top 20 cities in the UK as a Python list”.

uk_cities_coordinates = [
    ["London", 51.5074, -0.1278],
    ["Birmingham", 52.4862, -1.8904],
    ["Glasgow", 55.8642, -4.2518],
    ["Liverpool", 53.4084, -2.9916],
    ["Bristol", 51.4545, -2.5879],
    ["Manchester", 53.4808, -2.2426],
    ["Sheffield", 53.3811, -1.4701],
    ["Leeds", 53.8008, -1.5491],
    ["Edinburgh", 55.9533, -3.1883],
    ["Leicester", 52.6369, -1.1398],
    ["Coventry", 52.4068, -1.5197],
    ["Kingston upon Hull", 53.7676, -0.3274],
    ["Cardiff", 51.4816, -3.1791],
    ["Nottingham", 52.9548, -1.1581],
    ["Stoke-on-Trent", 53.0027, -2.1794],
    ["Southampton", 50.9097, -1.4044],
    ["Newcastle upon Tyne", 54.9783, -1.6174],
    ["Portsmouth", 50.8198, -1.0878],
    ["Bradford", 53.7950, -1.7594],
    ["Belfast", 54.5973, -5.9301],
]

Exercise 3 (extension)

Apply the Simulated Annealing algorithm to either of the problems (Exercise 1 or Exercise 2) above. The function to generate new solutions in Simulated Annealing can be similar to / the same as your mutation function for the genetic algorithm.