\( \newcommand{\blah}{blah-blah-blah} \newcommand{\eqb}[1]{\begin{eqnarray*}#1\end{eqnarray*}} \newcommand{\eqbn}[1]{\begin{eqnarray}#1\end{eqnarray}} \newcommand{\bb}[1]{\mathbf{#1}} \newcommand{\mat}[1]{\begin{bmatrix}#1\end{bmatrix}} \newcommand{\nchoose}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\defn}{\stackrel{\vartriangle}{=}} \newcommand{\rvectwo}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{\rvecthree}[3]{\left(\begin{array}{r} #1 \\ #2\\ #3\end{array}\right)} \newcommand{\rvecdots}[3]{\left(\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right)} \newcommand{\vectwo}[2]{\left[\begin{array}{r} #1\\#2\end{array}\right]} \newcommand{\vecthree}[3]{\left[\begin{array}{r} #1 \\ #2\\ #3\end{array}\right]} \newcommand{\vecdots}[3]{\left[\begin{array}{r} #1 \\ #2\\ \vdots\\ #3\end{array}\right]} \newcommand{\eql}{\;\; = \;\;} \newcommand{\dv}[2]{\frac{#1}{#2}} \newcommand{\half}{\frac{1}{2}} \newcommand{\mmod}{\!\!\! \mod} \newcommand{\ops}{\;\; #1 \;\;} \newcommand{\implies}{\Rightarrow\;\;\;\;\;\;\;\;\;\;\;\;} \definecolor{dkblue}{RGB}{0,0,120} \definecolor{dkred}{RGB}{120,0,0} \definecolor{dkgreen}{RGB}{0,120,0} \)


Module 2: Robotics Interlude - Planning


2.1   What do we mean by planning?

 

 

Exercise 1: Download and execute Arm.java.

 

We will now look at three planning problems as motivation for our study of planning:

 

Exercise 2: Download and unpack planning.jar, then execute PlanningGUI. Familiarize yourself with the three planning problems, and experiment with different goal configurations: generate "easy" and "hard" configurations to reach from the initial configuration. How can you tell what's easy or hard?
 

About planning problems in general:

 


2.2   State spaces and neighborhoods

 

Sometimes we use state instead of configuration:

 

Exercise 3: Find a goal state for the arm problem. How does one specify a complete description of this state?
 

Exercise 4: How many possible states are there for each of the three problems?
 

A neighborhood:

 

Exercise 5: What is the size of the neighborhood for each of the three problems?
 

Exercise 6: Consider the starting state in the 8-puzzle demo. Draw this state on paper, draw all its neighbors and all neighbors of its neighbors.
 

The underlying graph:

 

What we know about the shortest-path problem in graphs:

 

Unfortunately ...

The approach taken by planning algorithms:

 

Before looking at planning algorithms, let's examine the output of a planning algorithm.
 

Exercise 7: Compile and load MazeHandPlanner into PlanningGUI for the maze problem.

 


2.3   Non-cost methods: BFS and DFS

 

First, we'll explain how BFS and DFS work
     It'll become clear why they are "non-cost"

Note: BFS = Breadth-First-Search, DFS = Depth-First-Search
 

Key ideas:

  • We will consider the (unbuilt) graph of states and action-edges.

  • Instead of generating the whole graph, we will create nodes and edges as we proceed.
         Turns out, we will not need to store edges.

  • Use two data structures:
    • Visited: a collection of nodes (states) that we have "finished processing"
    • Frontier: a collection of nodes that we have partially processed.

  • At each step:
    • Pick a node from Frontier.
    • Examine action-edges from that node.
    • If there are neighbors that are unexplored, add them to frontier.
    • Stop when you find the goal, or when you are out of memory.

  • Here is an example snapshot:

    • The blue nodes are in Visited.
    • The red nodes are in the Frontier.
    • The black nodes are nodes not yet explored (or even generated).

  • BFS and DFS differ in their selection of which frontier node to process next.
    • BFS: Frontier is a queue.
    • DFS: Frontier is a stack.

  • Pseudocode for BFS:
        Algorithm: BFS (start, goal)
        Input: the start and goal nodes (states)
        1.   Initialize frontier and visited
        2.   frontier.add (start)
        3.   while frontier not empty
        4.       currentState = frontier.removeFirst ()
        5.       if currentState = goalState
        6.           path = makeSolution ()
        7.           return path
        8.       endif
        9.       visited.add (currentState)
        10.      for each neighbor s of currentState
        11.          if s not in visited or frontier
        12.              frontier.add (s)
        13.          endif
        14.      endfor
        15.  endwhile
        16.  return null               // No solution
        
 

Exercise 8: Compile and load BFSPlanner into PlanningGUI for the maze problem. Verify that it finds the solution by clicking on the "next" button once the plan has been generated. Likewise, apply BFS to an instance of the puzzle problem and verify the correctness of the plan generated.
 

Exercise 9: Implement DFS by modifying the BFS code.

  • Compare BFS and DFS using a variety of puzzle-problem instances.
  • Identify both the number of moves (time taken) and the number of steps in the path.
  • Write recursive pseudocode for DFS. Is there an advantage (or disadvantage) to using recursion?
 

Exercise 10: Compare the memory requirements of BFS and DFS. In general, which one will require more memory? Can you analyze (on paper) the memory requirements for each?
 

Exercise 11: Examine the use of the two data structures in BFS and DFS. Identify the operations performed on each of these. How much time is taken for each operation performed on these data structures? Are there data stuctures which take less time?
 


2.4   Cost-based methods

 

The problem with BFS/DFS:

  • Neither of them use any knowledge of the problem.
         e.g., in the maze problem, it's easy to calculate the distance to the goal.

  • Both can end up wasting time by searching "away from the goal."
 

Cost-based methods:

  • Recall: one objective of a planning algorithm is to identify the plan with the least number of "moves" (lowest cost).

  • What is "cost"?
    • For most planning problems: cost is the number of moves from the start state.
    • Some planning problems include the realization cost (some actions may take more time).
 

The Cost-Based-Planner (CBP) Algorithm:

  • From among the Frontier nodes, pick the one with the least total cost from the start state.

  • Whenever a state is added to the Frontier, see if the cost to that state has been reduced.

  • Pseudocode:
        Algorithm: CostBased (start, goal)
        Input: the start and goal nodes (states)
        1.   Initialize frontier and visited
        2.   frontier.add (start)
        3.   while frontier not empty
        4.       currentState = remove from frontier the state with least cost
        5.       if currentState = goalState
        6.           path = makeSolution ()
        7.           return path
        8.       endif
        9.       visited.add (currentState)
        10.      for each neighbor s of currentState
        11.          if s not in visited and not in frontier
        12.              frontier.add (s)
        13.          else if s in frontier
        14.              c' = cost to s via currentState
        15.              if c' < current cost of s
        16.                  current cost of s = c'
        17.              endif
        18.          endif
        19.      endfor
        20.  endwhile
        21.  return null               // No solution
        
 

Exercise 12: Implement the CBP by adding code to the method removeBest() in CBPlanner.java that is included in planning.jar. Most of the code has been written: you only need to extract the best node from the frontier using the costFromStart value in each state (which has already been computed for you).
 

Exercise 13: Compare BFS with Cost-Based for the puzzle problem.
 

Exercise 14: Examine the operations on data structures in CBP. Estimate the time needed for these operations. Suggest alternative data structures.
 

An improvement:

  • Note that Cost-Based-Planner (CBP) does not make any use of the goal state.
         Surely, one should give preference to the neighbors closer to the goal state?

  • The A* algorithm:
         Pick the state whose combined cost-from-start and cost-to-goal is the least.

  • Note:
    • The cost-from-start is exact because we compute it as we go along.
    • The cost-to-goal is not known but must be estimated.
 

Exercise 15: What is a reasonable estimate of the cost-to-goal for the maze and puzzle problems? That is, from given a state and the goal, what is an estimate of how many moves it would take to get from the state to the goal? Show by example how the estimate can fail in each case.
 

Exercise 16: In a new file called CBPlannerAStar.java, copy over your code from CBPlanner and implement the A* algorithm. Again, you do not need to perform the estimation. Simply use the estimatedCostToGoal value in each state (which has already been computed for you).
 

Exercise 17: Compare the time-taken (number of moves) and the quality of solution produced by each of A* and CBP for the maze and puzzle problems. Generate at least 5 instances of each problem and write down both measures (number of moves, quality) for each algorithm.
 

Exercise 18: Examine the code that produces the estimatedCostToGoal for the maze and puzzle problems. Can you suggest an alternative for the puzzle problem?
 


2.5   Completeness, optimality and efficiency

 

What these terms mean:

  • Completeness: if there's a solution (path to goal state), then the algorithm will find it.

  • Optimality: the algorithm finds the least-cost path to the goal, if at least one path exists.

  • Efficiency: the algorithm finds optimal paths in the least amount of time (its own running time).
 

Completness:

  • All the algorithms we have seen are complete, provided they don't run out of memory.

  • BFS is the most vulnerable
         Memory needs can be exponential in some problems
         (Tree example)

  • Memory requirements can be reduced by removing Visited altogether
         Does not affect completeness.

  • DFS has the least memory (for Frontier) needs, O(depth)
         O(depth) is rarely large

  • CBP and A* eventually search the whole state space
         They are complete.
 

Optimality:

  • BFS and DFS are not guaranteed to be optimal
         They can accidentally find long paths.

    Exercise 19: Create an example of the maze problem and show that DFS can be arbitrarily worse than optimal. What "arbitrarily worse" means: for any number x, you can find a maze problem example where DFS performs x% worse than than optimal.

  • CBP is optimal, but not necessarily efficient. To see why it's optimal:
    • Assume that CBP finds a non-optimal path:

    • Assume the blue path (above) is optimal, and the red one (below) is the non-optimal path found by CBP.
    • Consider the states just before reaching the goal node on each path.
    • The total cost at s', cost(s') < C* (less than optimal).
    • Therefore, CBP would have processed s' before reaching the goal state.
    • The total red cost is sub-optimal (> C*) by assumption.
    • This means the algorithm would have expanded s' before adding s* to Frontier along the red path.
           A contradiction

  • A* is optimal, and possibly more efficient than CBP, provided the estimated-cost-to-goal always underestimates the actual cost-to-goal. To see why it's optimal,
    • Assume that A* finds a non-optimal path (red path below):

    • Let g(s) = cost-from-start for any s.
    • Let h(s) = estimated-cost-to-goal from any s.
    • Let t(s) = true optimal cost to goal from any s.
    • Let f(s) = g(s) + h(s), the value used in the algorithm.
    • Because h(s') underestimates, f(s') = g(s') + h(s') ≤ g(s') + t(s') = C*
           f(s') ≤ C*
    • Since we are assuming A* found a non-optimal path, the cost along the red path is greater than C*.
           s' should have been expanded before adding s* to Frontier when constructing the red path
           A contradiction
 

Efficiency:

  • There is no proof that one algorithm is more efficient than another.

  • Generally, experimentation shows that A* is more efficient than CBP, which is more efficient than BFS/DFS.
 


2.6   Continuous spaces: the arm problem

 

A challenge with discretizing:

  • The arm moves "continuously"
         motors that move joints move in very small steps.

  • We could discretize, but at what granularity?
 

Let's first take a naive approach and fully discretize the space:

  • Two options:
    • One way to do this is to impose a grid on the space.
    • Another way: define discrete "neighbors" for each state.

  • We will use the second approach. For each state and each movable joint:
    • Define eight neighboring positions

    • If the coordinates of a joint are (x, y), then the new position is potentially (x+dx, y+dy) .
    • Here, dx is either 0, δ or .
    • Similarly, dy is either 0, δ or .
 

Exercise 20: Compare BFS, CBP and A* on the arm problem. Initially, use a simple target (a short distance up the y-axis). Gradually make the target harder.
 

Exercise 21: Identify the part of the code that computes the neighbors of a state in ArmProblem.java. Change the 8-neighborhood to a 4-neighborhood (N,S,E,W) and compare. In the CBP code, you can un-comment the "draw" line to see what the screen looks like when the algorithm is in action.
 

Exercise 22: How do we know we have visited a state before? How and where is equality-testing performed in the code? What makes this equality-test different from the test used in the maze and puzzle problems?
 

Next, let's consider discretizing another way:

  • We'll break the arm-movement problem into two levels:
    • A high level that's discrete.
    • A low level that's continuous.

  • Key ideas:
    • Use a discrete approach to move the arm from one major configuration to another:
           Example: from "straightened" to "folded"
    • Use a continuous "inverse-kinematics" approach for finer-grained movement.
 

Inverse kinematics:

  • Let's look at an example with a two-link arm:
    • Suppose we have an initial position:

    • And that we wish to move the gripper horizontally along a line, as shown above.

  • Let's now translate this problem into "angle commands"
         small changes in angle to enable the desired movement.

  • To start, let
    • \(\theta_0 \eql\) the angle of the first link from the horizontal.
    • \(\theta_1 \eql\) the angle of the second link from the first.

  • Now consider a small movement along the desired direction, and ask: what should the new angles be?

    That is, what should \(\theta_0^\prime\) and \(\theta_1^\prime\) be?

  • This way, we "feed" the differences to the motors:
    • The first motor (at the origin) is given the "change" \(\Delta\theta_0 \eql \theta_0^\prime - \theta_0\)
    • The second is given \(\Delta\theta_1 \eql \theta_1^\prime - \theta_1\)

  • Note: some motors are set up to receive the desired target angles \((\theta_0^\prime, \theta_1^\prime)\) instead of the change.

  • Let's first write the \((x,y)\) coordinates of the end effector: $$\eqb{ x & \eql & L_0 \cos(\theta_0) + L_1 \cos(\theta_0 + \theta_1)\\ y & \eql & L_0 \sin(\theta_0) + L_1 \sin(\theta_0 + \theta_1) }$$
     

    Exercise 23: Why is this true?

  • In our example, we want the y value to remain constant
         horizontal movement of the gripper

  • Clearly, there are multiple combinations of \(\theta_0,\theta_1\) that will allow for small horizontal movement.

  • Let's fix the change in the first angle and then calculate what the second change needs to be.

  • Suppose we just pick a (small) value for \(\Delta\theta_0\).

  • Then, $$ \theta_0^\prime \eql \theta_0 + \Delta\theta_0 $$

  • Next, since the y-value is fixed (and known) we'll use the second equation above: $$ \theta_1^\prime \eql \sin^{-1} \left( \frac{y - L_0 \sin\theta_0^\prime}{L_1} \right) \;- \; \theta_0^\prime $$

  • Which will result in the change $$ \Delta\theta_1 \eql \theta_1^\prime - \theta_1 $$

  • The two numbers \(\Delta\theta_0, \Delta\theta_1\) are the controls applied to the arm.
 

Now let's look at a slighly more complex task that combines high-level discrete states with low-level small changes to angles:

  • Suppose a two-link arm is initially folded so that the end is on the y-axis.

  • The final destination: \(x=30, y=10\).

  • Suppose, further, that high-level planning (such as A*) creates the following high-level sequence of actions:

    where:

    • The arm is first nearly straightened as a precursor to folding "the other way".
    • The second link is brought down to \(y=10\).
    • The end is now moved along \(y=10\) until \(x=30\).
     

    Exercise 24: Within each of the above phases, identify the signs (positive or negative) of \(\Delta\theta_0, \Delta\theta_1\). For example, when straightening, what are the signs of each likely to be?

  • We'll then set up our control loop to follow the phases:
        // Initially, state=0
        while not over
            if state = 0
                if (straightened) 
                    state = 1       // Go to state 1
                else
                    apply Δθ0,Δθ1 to straighten
                endif
            else if state = 1
                if y = 10
                    state = 2       // Go to state 1
                else
                    apply Δθ0,Δθ1 to bring 2nd link down
                endif
            else if state = 2
                if x = 30
                    stop
                else
                    apply Δθ0,Δθ1 to move horizontally
                endif
            endif
        endwhile
        
        
 

Exercise 25: Download and unpack twolinkexample.zip.

  • Examine the code in TwoLinkController.java to confirm that the above controls are implemented.
  • Add code for the last part, to compute \(\theta_1^\prime\), or newTheta1 in the code so that the end effector reaches \((30,10)\).
  • To execute, run ArmSimulator, type in the controller name "TwoLinkController", set the number of links (to 2), then click on "tracing", then "reset", then "go".
 

In practice:

  • It is not at all straightforward to combine high-level planning and inverse-kinematics.

  • Inverse-kinematics calculations are highly dependent on the particular arm or robot.

  • Most inverse-kinematics calculations must also take into account other complexities, for example:
    • Forces and torques for smoother movement, for handling loads.
    • Safety: to prevent the arm from crashing into itself.
    • Corrections from drift, when model and reality are out of sync.

  • Both planning and inverse-kinematics continue to be active areas of research.
 


2.7   Other algorithms

 

Greedy:

  • Instead of combining the cost-from-start and estimated-cost-to-goal, we could use just estimated-cost-to-goal
         Called the Greedy planning algorithm.

  • Greedy is neither guaranteed to be complete nor optimal.

  • Greedy can be made complete if we make sure that at least one node from the Frontier is expanded each step.
 

Exercise 26: Create an example of the maze problem in which Greedy performs badly.
 

Reducing memory requirements:

  • There are two commonly-used approaches for reducing memory-requirements:
    • Fix the memory size, and throw out nodes heuristically.
    • Re-run an algorithm several times
           Each time, use what was learned in earlier iterations to prune the search space.

  • Limiting memory size: SMA* (Simplified Memory-Bounded A*)
    • Use A* until memory is full.
    • Drop the node with the highest-cost.
    • Record this node's cost in its parent.
           So that we don't expand the parent until it becomes necessary.

  • Iterative deepening: IDA*
    • Fix a cost bound B.
    • Apply A* until costs exceed B.
           Let B' = the cost that first exceeded B.
    • Set B = B' and re-run.
    • Repeat until goal node found.
 

Other ideas in search:

  • There is a whole world of search algorithms, and many specialized books on the subject.

  • Example sub-topics: search for games, realtime-search, meta-heuristics.
 

About planning:

  • We have only lightly touched upon the general planning problem.

  • Planning is a vast area with all kinds of research, books and products.

  • There is an entire sub-area related to planning motion.

  • An example of successful planning: Mars Rover mission.



© 2008, Rahul Simha