Module 10: Combinatorial Optimization Problems, State Spaces and Local Search


Optimization Problems

What are they?

Example: the Travelling Salesman Problem (TSP)

In-Class Exercise 10.1: For an n-point problem, what is the size of the solution space (i.e., how many possible tours are there)?

Example: the Bin Packing Problem (BPP)

In-Class Exercise 10.2: Consider the following Bin Packing problem: there are three items with sizes 1, 2 and 3 respectively, and a bin size of 6. Enumerate all possible assignments.

Example: Quadratic Programming Problem

Types of optimization problems:


Problem Size and Execution Time

Problem size:

Execution time of an algorithm:

Consider these two algorithms for TSP:

In-Class Exercise 10.3: Consider the following 4 input points: (0,0), (1,0), (1,1) and (0,-2).

  1. Show the steps in executing each algorithm on this input.
  2. What is the complexity (execution time) of Algorithm 1 on an input of size n?
  3. What is the complexity of Algorithm 2 on an input of size n?

Polynomial vs. exponential complexity:

In-Class Exercise 10.4: Which of these are polynomial-time algorithms?

  1. Algorithm 0 runs in time O( (n2 + 3)4 ).
  2. Algorithm 1 runs in time O(n log(n)).
  3. Algorithm 2 runs in time O(nn).
  4. Algorithm 3 runs in time O(nlog(n)).
  5. Algorithm 4 runs in time O( (log n)3 ).

Preview of things to come:


Combinatorial Optimization Problems

A combinatorial optimization problem is:

Example: TSP

  • Each instance of TSP is a combinatorial optimization problem.

  • Example: the 4-point TSP problem with points (0,1), (1,0), (2,3) and (3,5)
    • Does this have a set of "states" or "candidate solutions"?
      => Yes: S = { all possible tours } = { [0 1 2 3], [0 1 3 2], [0 2 1 3] }

    • Is there a well-defined cost function on the states?
      => Yes: C(s) = length of tour s
      e.g., C([0 1 2 3]) = dist(0,1) + dist(1,2) + dist(2,3) + dist(3,0).

    • Is the goal to find the least-cost state?
      => Yes: find the tour with minimal length.

Example: BPP

  • States: all possible assignments of items to bins.

  • Cost function: C(s) = number of bins used in state s.

  • Goal: find the state that uses the least bins (minimal-cost state).

In-Class Exercise 10.5: Which of these, if any, are combinatorial optimization problems? Identify the state space for each.

  1. Given a connected, weighted graph, find the minimum-spanning tree.
  2. Given a connected graph, discover whether it has a Hamiltonion tour.
    (Recall: a Hamiltonian tour is a tour that passes through each vertex exactly once.)
  3. Find the values of x1 and x2 that minimizes the function f(x1, x2) = x12 + x22 such that 3x1+2x2=100

Size of a combinatorial optimization problem:

  • The input is usually of size O(n) or O(n2).
    • TSP: list of n points.
    • BPP: n item sizes and one bin size.
    • Graph-based TSP: n vertices and up to O(n2) edges.
    • MST: n vertices and up to O(n2) edges.

  • The state-space is usually exponential in size:
    • TSP: all possible tours.
    • BPP: all possible assignments of items to bins.
    • MST: all possible spanning trees.

  • The output is usually of size O(n) or O(n2)
    • TSP: a tour of size O(n)
    • BPP: an assignment (matrix) of size O(n2).
    • MST: an adjacency matrix of size O(n2).


Greedy Algorithms

Key ideas:

  • For many combinatorial optimization problems (but not all!), it is easy to build a candidate solution quickly.

  • Use problem structure to put together a candidate solution step-by-step.

  • At each step: "do the best you can with immediate information"

  • Greedy algorithms are usually O(n) or O(n2).

Example: TSP

  • Greedy Algorithm:
    1. Initially the set P = {0, ..., n-1 } and the set Q is empty.
    2. Move 0 from P to Q.
    3. Repeat the following until P is empty:
      • Suppose k was the point most recently added to Q.
      • Find the point in P closest to k and move that to Q.
    4. Output points in the order in which they were added to Q.

  • What is "greedy" about this?
    • At each step, we add a new point to the existing tour.
    • The new point is selected based on how close it is to previous point.
    Greedy => no backtracking.

  • Execution time: O(n2) (each step requires an O(n) selection).

  • Solution quality: not guaranteed to find optimal solution.

Example: BPP

  • Greedy Algorithm:
    1. Let A = { all items }
    2. Sort A in decreasing order.
    3. At each step until A is empty:
      • Remove next item in sort-order from A.
      • Find first-available existing bin to fit item.
      • If no existing bin can fit the item, create a new bin and place item in new bin.

  • Running time: O(n log(n)) for sort and O(n2) for scanning bins at each step (worst-case).
    => O(n2)

  • Solution quality: not guaranteed to find optimal solution.

Example: MST (Minimum Spanning Tree)

  • Greedy Algorithm: (Kruskal)
    1. Sort edges in graph in increasing order of weight.
    2. Process edges in sort-order:
      • If adding the edge causes a cycle, discard it.
      • Otherwise, add the edge to the current tree.

  • Complexity: O(E log(E)) for sorting, and O(E log(V)) for processing the edges with union-find.
    => O(E log(E)) overall.

  • Solution quality: finds optimal solution.

About greedy algorithms:

  • For many problems, it's relatively easy to pose a greedy algorithm.

  • For a few problems (e.g., MST), the greedy algorithm produces the optimal solution.

  • For some problems (e.g. BPP), greedy algorithms produce "reasonably good" solutions (worst-case).

  • For some problems (e.g. BPP), greedy algorithms produce "excellent" solutions (in practice).

  • For some problems (e.g., Hamiltonian tour), greedy algorithms are of no help at all.


Walking around the State Space

Consider the following two algorithms for TSP:

  • Algorithm 1:
    
         // Start with input-order tour: 
    1.   s = initial tour 0,1,...,n-1
    
         // Record current best tour. 
    2.   min = cost(s)
    3.   minTour = s
    
         // Repeat a large number of times: 
    4.   for i=1 to very-large-number
             // Create a random tour from scratch. 
    5.       s = generate random tour
             // See if it's better than our current best. 
    6.       if cost(s) < min
    7.           min = cost(s)
    8.           minTour = s
    9.       endif
    10.  endfor
    
         // Return best tour found. 
    11.  return minTour
      

  • Algorithm 2:
    
         // Start with input-order tour: 
    1.   s = initial tour 0,1,...,n-1
    
         // Record current best tour. 
    2.   min = cost(s)
    3.   minTour = s
    
         // Repeat a large number of times: 
    4.   for i=1 to very-large-number
             // Use current tour s to create next tour 
    5.       Pick two random points in s;
    6.       s = tour you get by swapping the two selected points in s
             // See if it's better than our current best. 
    7.       if cost(s) < min
    8.           min = cost(s)
    9.           minTour = s
    10.      endif
    11.  endfor
    
         // Return best tour found. 
    12.  return minTour
      

In-Class Exercise 10.6: Start with a 5-point TSP problem. Use some physical solution to creating randomness (e.g, flip coins) and emulate the above two algorithms for a few steps.

  • Observe: Each algorithm "walks around" the state space in some random fashion.

  • Neighborhood:
    • Define N(s) = Neighborhood(s) = { all states you can get to from s} in any particular iteration using one of the above algorithms.
    • Example: 4-point problem
      • N (0123) = { all possible tours } in Algorithm 1.
      • N (0123) = { 0213, 0321, 0132 } in Algorithm 2.
    • A neighborhood is polynomial if the size of each neighborhood can be bounded by a polynomial of the input size.

In-Class Exercise 10.7: For an n-point TSP instance, what is the size of a neighborhood in each of the above algorithms?

A different way of writing the code:

  • Consider this generic "walk" in state-space:
    
    Algorithm: stateSpaceWalk (points)
    Input: n points in TSP problem
    1.   s = initial tour 0,1,...,n-1
    2.   min = cost(s)
    3.   minTour = s
    4.   for i=1 to very-large-number
             // Let method "nextState" produce the next state 
    5.       s = nextState(s)
    6.       if cost(s) < min
    7.           min = cost(s)
    8.           minTour = s
    9.       endif
    10.  endfor
    11.  return minTour
      

  • We have seen two different nextState function's:
    1. Algorithm 1 above:
      
      Algorithm: nextState (s)
      Input: current state s
          // Ignore s and generate an arbitrary tour 
      1.  t = Generate a random tour
      2.  return t
          
    2. Algorithm 2 above:
      
      Algorithm: nextState (s)
      Input: current state s
      1.  t = s
      2.  Pick any two random points in t
      3.  Swap them.
      4.  return t
          

Reachability:

  • A "nextState" function is reachable if, from the start state, you can eventually reach every other state by walking around long enough.

  • Both "nextState" functions above are reachable.

In-Class Exercise 10.8: Can you think of a nextState function that is not "reachable"?


Greedy Local Search

Key ideas:

  • A local search algorithm:
    • An algorithm for a combinatorial optimization problem.
    • Uses a "nextState" function to roam around the state space.
    • Records the "best-solution-seen-so-far" while roaming.

  • A greedy local search algorithm:
    • Has little to do with "greedy algorithms" described earlier.
    • Uses a more intelligent "nextState" function
      => it examines the neighborhood, and selects the best state in the neighborhood.

Example: greedy local search for TSP

  • Idea:
    • Generate a neighborhood N(s) for the current state s
    • Pick the "best" (lowest-cost) state in N(s) as the next-state.
    • Repeatedly pick "best" neighbors until no neighbor is better.

  • Pseudocode:
    
    Algorithm: TSPGreedyLocalSearch (points)
    Input: array of points
    
         // Start with any tour, e.g., in input order 
    1.   s = initial tour 0,1,...,n-1
    
         // Repeatedly find a "neighboring" tour, as long as the neighbor is better. 
    2.   repeat
    3.       min = cost(s)
    4.       minTour = s
             // Use "greedy" approach in finding a new state to walk to. 
    5.       s = greedyNextState(s)
    6.   until cost(s) > min
    
         // Output best tour seen. 
    7.   return minTour
    
    Output: best tour found by algorithm
      
    
    Algorithm: greedyNextState (s)
    Input: a tour s, an array of integers
    
         // Start with current tour. 
    1.   min = cost (s)
    2.   bestNeighbor = s
    
         // Generate all possible neighbors by swapping two elements in s. 
         // Note: go through all possible i and j systematically - O(n2) combinations 
    3.   for each i and j 
    4.       s' = swap i-th and j-th points in s
    5.       if cost (s') < min
    6.           min = cost (s')
    7.           bestNeighbor = s'
    8.       endif
    9.   endfor
    
         // Return best one found. 
    10.  return bestNeighbor
    
    Output: a tour (the best neighboring tour, using only swaps)
      

Greedy local search for BPP:

  • Start with any assignment of items to bins.

  • At each step, find the least-cost "neighboring assignment" and make that the next assignment.

  • Repeat this process until no neighbor is better.

  • Computing a "neighbor":
    • The current state is an assignment of items to bins.
    • Try swaps between all possible pairs of bins, i and j.
    • For a particular pair of bins i and j, pick an item in each randomly and swap them.

Non-uniqueness of greedy-local-search:

  • There is no "best way" to define greedy local search for a problem.

  • It is possible to devise many different heuristics for greedy local search.

  • Example: alternative greedy heuristic for TSP
    • Given the current tour s, consider all possible next tours you can get by a 3-way swap.

  • Example: alternative greedy heuristic for BPP
    • Given current assignment, lay out items in order of bins. Then, swap items and return to bins in first-fit order.

In-Class Exercise 10.9: Download this template and implement the following variation of greedy-local-search:

  • Instead of trying all possible pairs (which takes O(n2) time, we will use an O(n)-neighborhood.
  • Given the current tour, pick a random point i in the current tour.
  • Then, try all possible other points j ( j != i) to get n-1 pairs (i,j).
  • Try a swap of i and j for each of these pairs, and pick the best one.
  • Return the tour resulting from the best such swap as the nextState.


Local Optima and Problem Landscape

Local optima:

  • Recall: greedy-local-search generates one state (tour) after another until no better neighbor can be found
    => does this mean the last one is optimal?

  • Observe the trajectory of states:

    • There is no guarantee that a greedy local search can find the (global) minimum.

  • The last state found by greedy-local-search is a local minimum.
    => it is the "best" in its neighborhood.

  • The global minimum is what we seek: the least-cost solution overall.

  • The particular local minimum found by greedy-local-search depends on the start state:

Problem landscape:

  • Consider TSP using a particular local-search algorithm:
    • Suppose we use a graph where the vertices represent states.
    • An edge is placed between two "neighbors"
      e.g., for a 5-point TSP the neighbors of [0 1 2 3 4] are:

    • The cost of each tour is represented as the "weight" of each vertex.
    • Thus, a local-search algorithm "wanders" around this graph.

  • Picture a 3D surface representing the cost above the graph.
    => this is the problem landscape for a particular problem and local-search algorithm.

  • A large part of the difficulty in solving combinatorial optimization problems is the "weirdness" in landscapes
    => landscapes often have very little structure to exploit.

  • Unlike continuous optimization problems, local shape in the landscape does NOT help point towards the global minimum.

Climbing out of local minima:

  • A local-search algorithm gets "stuck" in a local minimum.

  • One approach: re-run local-search many times with different starting points.

  • Another approach (next): help a local-search algorithm "climb" out of local minima.


Simulated Annealing

Background:

  • What is annealing?
    • Annealing is a metallurgic process for improving the strength of metals.
    • Key idea: cool metal slowly during the forging process.

  • Example: making bar magnets:
    • Wrong way to make a magnet:
      1. Heat metal bar to high temperature in magnetic field.

      2. Cool rapidly (quench):

    • Right way: cool slowly (anneal)

  • Why slow-cooling works:
    • At high heat, magnetic dipoles are agitated and move around:

    • The magnetic field tries to force alignment:

    • If cooled rapidly, alignments tend to be less than optimal (local alignments):

    • With slow-cooling, alignments are closer to optimal (global alignment):

  • Summary: slow-cooling helps because it gives molecules more time to "settle" into a globally optimal configuration.

  • Relation between "energy" and "optimality"
    • The more aligned, the lower the system "energy".
    • If the dipoles are not aligned, some dipoles' fields will conflict with others.
    • If we (loosely) associate this "wasted" conflicting-fields with energy
      => better alignment is equivalent to lower energy.
    • Global minimum = lowest-energy state.

  • The Boltzmann Distribution:
    • Consider a gas-molecule system (chamber with gas molecules):

    • The state of the system is the particular snapshot (positions of molecules) at any time.
    • There are high-energy states:

      and low-energy states:

    • Suppose the states s1, s2, ... have energies E(s1), E(s2), ...
    • A particular energy value E occurs with probability

      P[E] = Z e-E/kT

      where Z and k are constants.

  • Low-energy states are more probable at low temperatures:
    • Consider states s1 and s2 with energies E(s2) > E(s1)
    • The ratio of probabilities for these two states is:

      r = P[E(s1)] / P[E(s2)] = e[E(s2) - E(s1)] / kT = exp ([E(s2) - E(s1)] / kT)

In-Class Exercise 10.10: Consider the ratio of probabilities above:

  • Question: what happens to r as T increases to infinity?
  • Question: what happens to r as T decreases to zero?
What are the implications?

Key ideas in simulated annealing:

  • Simulated annealing = a modified local-search.

  • Use it to solve a combinatorial optimization problem.

  • Associate "energy" with "cost".
    => Goal: find lowest-energy state.

  • Recall problem with local-search: gets stuck at local minimum.

  • Simulated annealing will allow jumps to higher-cost states.

  • If randomly-selected neighbor has lower-cost, jump to it (like local-search does).

  • If randomly-selected neighbor is of higher-cost
    => flip a coin to decide whether to jump to higher-cost state
    • Suppose current state is s with cost C(s).
    • Suppose randomly-selected neighbor is s' with cost C(s') > C(s).
    • Then, jump to it with probability

      e-[C(s') - C(s)] / kT

  • Decrease coin-flip probability as time goes on:
    => by decreasing temperature T.

  • Probability of jumping to higher-cost state depends on cost-difference:

Implementation:

  • Pseudocode: (for TSP)
    
    Algorithm: TSPSimulatedAnnealing (points)
    Input: array of points
    
         // Start with any tour, e.g., in input order 
    1.   s = initial tour 0,1,...,n-1
    
         // Record initial tour as best so far. 
    2.   min = cost (s)
    3.   minTour = s
    
         // Pick an initial temperature to allow "mobility" 
    4.   T = selectInitialTemperature()
    
         // Iterate "long enough" 
    5.   for i=1 to large-enough-number
               // Get potential next state using greedy-local-search. 
    6.         s' = randomNextState (s)
               // If it's better, then jump to it. 
    7.         if cost(s') < cost(s)
    8.             s = s'
                   // Record best so far: 
    9.             if cost(s') < min
    10.                min = cost(s')
    11.                minTour = s'
    12.            endif
    13.        else if expCoinFlip (s, s')
                   // Jump to s' even if it's worse. 
    14.            s = s'
    15.        endif       // Else stay in current state. 
               // Decrease temperature. 
    16.        T = newTemperature (T)
    17.  endfor
    
    18.  return minTour
    
    Output: best tour found by algorithm
      
    
    Algorithm: randomNextState (s)
    Input: a tour s, an array of integers
    
        // ... Swap a random pair of points ... 
    
    Output: a tour 
      
    
    Algorithm: expCoinFlip (s, s')
    Input: two states s and s'
    
    1.   p = exp ( -(cost(s') - cost(s)) / T)
    2.   u = uniformRandom (0, 1)
    3.   if u < p
    4.       return true
    5.   else
    6.       return false
    
    Output: true (if coinFlip resulted in heads) or false
      

  • Implementation for other problems, e.g., BPP
    • The only thing that needs to change: define a nextState method for each new problem.
    • Also, some experimentation will be need for the temperature schedule.

Temperature issues:

  • Initial temperature:
    • Need to pick an initial temperature that will accept large cost increases (initially).
    • One way:
      • Guess what the large cost increase might be.
      • Pick initial T to make the probability 0.95 (close to 1).

  • Decreasing the temperature:
    • We need a temperature schedule.
    • Several standard approaches:
      • Multiplicative decrease: Use T = a * T, where a is a constant like 0.99.
        => Tn = an.
      • Additive decrease: Use T = T - a, where a is a constant like 0.0001.
      • Inverse-log decrease: Use T = a / log(n).
    • In practice: need to experiment with different temperature schedules for a particular problem.

Analysis:

  • How long do we run simulated annealing?
    • Typically, if the temperature is becomes very, very small there's no point in further execution
      => because probability of escaping a local minimum is miniscule.

  • Unlike previous algorithms, there is no fixed running time.

  • What can we say theoretically?
    • If the inverse-log schedule is used
      => Can prove "probabilistic convergence to global minimum"
      => Loosely, as the number of iterations increase, the probability of finding the global minimum tends to 1.

In practice:

  • Advantages of simulated annealing:
    • Simple to implement.
    • Does not need much insight into problem structure.
    • Can produce reasonable solutions.
    • If greedy does well, so will annealing.

  • Disadvantages:
    • Poor temperature schedule can prevent sufficient exploration of state space.
    • Can require some experimentation before getting it to work well.

  • Precautions:
    • Always re-run with several (wildly) different starting solutions.
    • Always experiment with different temperature schedules.
    • Always pick an initial temperature to ensure high probability of accepting a high-cost jump.
    • If possible, try different neighborhood functions.

  • Warning:
    • Just because it has an appealing origin, simulated annealing is not guaranteed to work
      => when it works, it's because it explores more of the state space than a greedy-local-search.
    • Simply running greedy-local-search on multiple starting points may be just as effective, and should be experimented with.

Variations:

  • Use greedyNextState instead of the nextState function above.
    • Advantage: guaranteed to find local minima.
    • Disadvantage: may be difficult or impossible to climb out of a particular local minimum:
      • Suppose we are stuck at state s, a local minimum.
      • We probabilistically jump to s', a higher-cost state.
      • When in s', we will very likely jump back to s (unless a better state lies on the "other side").
    • Selecting a random next-state is more amenable to exploration.
      => but it may not find local minima easily.

  • Hybrid nextState functions:
    • Instead of considering the entire neighborhood of 2-swaps, examine some fraction of the neighborhood.
    • Switch between different neighborhood functions during iteration.

  • Maintain "tabu" lists:
    • To avoid jumping to states already seen before, maintain a list of "already-visited" states and exclude these from each neighborhood.

In-Class Exercise 10.11: Download this template and implement simulated annealing using the neighborhood function of Exercise 10.9. The main-loop as well as the temperature schedule is already in the template.