Note 13: Graphs

This note provides a very basic introduction to graphs. You will explore graphs in substantially more detail in future courses.

Graphs vs. Trees

Where a tree consists of parents and children (and nodes of a tree can be both parent and child), the more general graph consists of pairs of nodes connected by edges. Unlike trees, which have an inherent directional structure (with a root), edges in a graph can connect arbitrary pairs of nodes. Trees are graphs, but graphs are not necessarily trees. Specifically, a tree permits only one valid path between any two nodes.

The below shows nodes A, B, and C connected by edges such that there exists only one valid path between any two nodes. This graph is a tree:

The below adds an additional edge, creating multiple paths between pairs of nodes. This graph is not a tree:

Directed and Undirected Graphs

Edges between nodes in graphs can be directional, in which case, the graph is a directed graph. The example above lacks directions, and therefore is an undirected graph. In the undirected graph above, there is a path from A to B and also from B to A along the same edge:

Contrast this with a directed graph:

The directed graph has a direct path from A to B:

However, this is a directional path and does not go from B to A. There exists a path from B to A, through the B \(\rightarrow\) C edge, and then through the C \(\rightarrow\) A edge:

Graph Data Structures

In theory, any linked-node data structure such as the linked lists and trees we have already worked with can represent a graph. While this is not the most efficient way for many operations, it’s a starting point for us. The most common reason to use this sort of data structure is if it is already in use for some other reason.

In the practice problems, you’ll build a “pointer-based” graph.

To avoid the expense of traversing a pointer-based graph, graphs are commonly represented as adjacency lists, which are “arrays of lists.” The nomenclature “arrays of lists” is a bit confusing in Python, which uses the list data structure where many other languages use something called an “array.” An adjacency list is a simple list of lists.

Lists, Arrays, Vectors, Sets

Different programming languages use different built-in data structures, and handle memory differently. While this is a course in the Python language, it’s important to understand what features are specific to Python, and how concepts apply generally to other programming languages.

For adjacency lists, the “list of lists” is really a “collection of collections.” In Python, we use the list, which is an ordered collection of values, but we don’t need the collection to be ordered. We could use an unordered data structure, such as a set.

Python sets are similar to dicts, except instead instead of unique keys mapped to values, a set simply contains unique elements.

s = {"A", "B", "C"}
s.add("A") # doesn't do anything because "A" is already in the set
s.add("J")
s.remove("B")
print(type(s), s)
<class 'set'> {'J', 'C', 'A'}

For this directed graph:

The adjacency list is:

You’ll implement this in the practice problems

Graph Algorithms

We’ll look at a few algorithms that can be run to answer questions about graphs (there are many more).

The first question we’ll look at is: from a given node, what other nodes can be reached? For instance, in the example above, node D can be reached from node A, but node A cannot be reached from node D. Indeed: no nodes can be reached from node D.

How do we check which nodes can be reached from a starting node? With a search algorithm: starting with our starting node, check each node that it is connected to by an edge, for each of these nodes, check each node that it is connected to be an edge… and so forth.

There are two basic varieties of this search: “depth-first” and “breadth-first.”

Depth-first search (DFS) checks a node from the root, then checks a node connected to the new node, and so forth, searching “deep” before checking any other node connected to the root.

# pseudocode
def depth_first(current_node, reachable):
  for node in current_node.edges(): # nodes reachable from  
    if node not in reachable:
      reachable.add(node)
      reachable = reachable | depth_first(node, reachable)

You actually implemented a variant of depth-first search in the recursion note!

There is also a non-recursive implementation. This implementation forms the basis for several other search algorithms.

#pseudocode
reachable = set()
stack = Stack()
stack.push(root)
while len(stack) > 0:
  node = stack.pop()
  if node not in reachable:
    reachable.add(node)
    for new_node in node.edges():
      stack.push(new_node)

Try implementing both of these in the practice problems.

Once you have implemented the non-recursive DFS, implementing breadth-first-search,(BFS) which searches nodes close to (fewer edges away from) the root, is very simple. DFS uses a stack: new nodes that are added to the stack most recently are removed first. By changing the stack to a queue, nodes that were added most long ago are removed first.

#pseudocode
reachable = set()
queue = Queue()
queue.enqueue(root)
while len(queue) > 0:
  node = queue.dequeue()
  if node not in reachable:
    reachable.add(node)
    for new_node in node.edges():
      queue.enqueue(new_node)

Weighted Edges

Real world problems are often represented by weighted graphs, where each edge has an associated cost. Unweighted graphs can be interpreted as weighted graphs where each edge has equal cost. For directed graphs, edges between two nodes in different directions can have different costs:

We are typically interested in finding the lowest path cost to each node from some starting node. Dijkstra’s Algorithm (called this because it was discovered by Edsger Dijkstra) accomplishes this. It is similar to DFS and BFS, with two differences:

  • A priority queue is used rather than a stack or a queue
    • The priority queue is ordered by the path cost to the node from the starting node
  • We must record path costs, and account for total path costs when adding new nodes to the priority queue
#pseudocode
path_costs = {} # a dict
priority_queue = PriorityQueue()
priority_queue.enqueue((root, 0)) # node, cost tuple
while len(priority_queue) > 0:
  node, cost = priority_queue.dequeue()
  if node not in path_costs:
    path_costs[node] = cost
    for new_node, new_cost in node.edges():
      priority_queue.enqueue(new_node, cost + new_cost)

Practice

Practice Problem 13.1

Practice Problem 13.1

Create Node and DirectedGraph classes to represent a directed graph. Use pointers and not adjacency lists.

Your DirectedGraph class should have a constructor method that takes a string argument in the following format:

The Node class should have an edges method that returns which nodes have a direct edge from the current node.

graph.txt
A -> B
A -> C
B -> C
C -> A

The result should be a representation of the graph using your data structure.

Practice Problem 13.2

Practice Problem 13.2

Modify the DirectedGraph class you created in the previous practice problem for undirected graphs, as UndirectedGraph. The constructor should parse a file with the following format:

graph.txt
A -- B
A -- C
B -- C

Practice Problem 13.3

Practice Problem 13.3

Implement both DirectedGraph and UndirectedGraph using adjacency lists: there should be no Node class necessary for these graph classes.

Include constructors to parse input text files in the format previously used.

Practice Problem 13.4

Practice Problem 13.4

Implement recursive depth-first search for your adjacency-list based DirectedGraph class by creating a dfs method that takes as argument a string representing the starting node, e.g., Graph.dfs("A") and returns a set of all reachable nodes.