Applications of graph theory:
 Fundamental mathematical construct to represent "connectivity".
 Appears in thousands of problems.
 Source of many classic problems: traveling salesman, routing,
spanning trees.
 Many "graphstructured" applications: networks,
transportationsystems, electronic circuits, molecules.
Graph theory as a source of computer science theory:
 Many important algorithms.
 Key to understanding algorithm design and analysis.
 Simple to describe, yet perplexing:
 Euler tour: easy problem.
 Hamiltonian tour: hard problem.
The field of graph theory:
 Large area of mathematics:
 Analysis of general graphs.
 Analysis of special types of graphs.
 Many classic problems
e.g., the fourcolor theorem.
 Optimization problems based on graphs,
e.g., shortestpaths.
 Graph algorithms: an area in computer science.
 Rich source of algorithms, theory, insight.
 Useful algorithms used in many applications
(e.g., in a compiler).
Random graphs
Consider this procedure to generate a random graph:
 Draw n vertices.
 Consider each possible pair of vertices in turn:
 For each such pair, flip a coin.
 If heads, place an edge between these vertices.
Exercise:
Split into groups of 4 or 5. Generate a random graph with 10 nodes
using coin flips.
Exercise:
Suppose we use a coin where Pr[heads] = 0.1.
Will we get more or fewer edges than with a fair coin?
Parametrized random graphs:
 Parameter: 0 < p < 1 (density)
 Use a coin such that Pr[heads] = p to generate the
graph.
Exercise:
Split into groups of 4 or 5. Generate a random graph using p = 0.1.
Connectivity:
 Clearly, if p is small, we might have multiple components
(disconnected graph).
 Examples (using graph tool).
 Major result (Erdos and Renyi):
 If p < 1/n, the graph is almost always disconnected.
 If p >= log(n)/n, the graph is almost always connected.
 If 1/n < p < log(n)/n, the graph is dominated by
a giant component.
Other properties:
 Average path length is small: about log(n).
 Cluster coefficient is very small: less than 1.
Exercise:
Can you create a graph where the average shortestpath is
quite long?
The small world phenomenon
Small world phenomenon:
 Stanley Milgram's experiment (1967).
 Path length:
For any two random persons, there exists six people to connect
them.
=> short path lengths.
 Cluster coefficient:
your friends are friends amongst themselves
=> high cluster coefficient.
Observation: the standard random graph does not model "small
world".
The WattsStrogatz random model:
 Two parameters:
 p: the probability of rearrangement.
 k: the number of "ring" neighbors.
 Start with a ring.
 Each node is connected to k successive neighbors.
 For each edge in turn:
 Flip a pbiased coin.
 If heads, replace edge with a random edge from one of its
nodes to a random other node.
Examples: (demo)
Properties:
Examples of small world graphs.

L_{actual}

L_{random}

C_{actual}

C_{random}

Film actors

3.65

2.99

0.79

0.00027

Power grid

18.7

12.4

0.080

0.005

C. elegans

2.65

2.25

0.28

0.05

(From http://www.santafe.edu/sfi/publications/Bulletins/bulletinFall99/workInProgress/smallWorld.html)


Applications:
 (Biology) Modeling of disease epidemics.
 (Epidemiology) Efficacy of needleexchange programs (AIDS prevention):
 Used needle = random link.
 Removal of discarded needles: reduces chances of disease
percolation (epidemic).
 (Economics) Modeling of fads, momentum investing.
 (Business) Organizational hierarchy (Toyota example).
 (Commercial) meetup.com, friendster.com, tribe.com, linkedin.com
Powerlaw graphs
Exercise:
Which function is "smaller" for fixed k: f(n) = e^{kn}
or g(n) = n^{k}?
Consider the degree distribution of a graph:
 Let f_{i} = number of vertices with i edges.
 One can show that for random graphs, f_{i} is
proportional to e^{a*i} (for some constant a).
 A powerlaw graph is a graph in which
f_{i} is proportional to i^{ k} (for some constant k).
Examples:
 A random (ErdosRenyi) example (demo).
 EColi gene regulation graph (demo).
 Internet, among many other real data sets.
Implications:
 Random graphs have very few highdegree nodes.
 Powerlaw graphs have some highdegree nodes.
 Robustness implications:
 A random removal of nodes can disconnect a random network.
 Powerlaw graphs are more resistant to random removals.
 Powerlaw graphs are extremely sensitive to targeted removals.
=> Consider what happens when you remove the highdegree nodes.
 Biological implications:
 Highdegree nodes: genes involved in regulating many others (bookkeeping).
 Robustness necessary to support mutation/evolution.
Network motifs
A graph or network motif is a small localized substructure in a graph.
Notation: TF = Transcription Factor
Example: FeedForward (FF) loop motif:
 A FeedForward loop has three elements:
 Two kinds of FF's:
 Data from E Coli:
 FFloops occur more frequently than in random graphs.
 85% of FF loops are coherent.
Example: SingleInput Module (SIM) motif:
 One TF regulates many genes.
 Data from E Coli:
 24 such motifs
 Large SIM's, that occur in E Coli, are highly unlikely in random graphs.
Motif function:
Coordination:
 Coordination is needed to achieve a larger, higherlevel function.
 In a computer, there is an explicit module that controls the
sequence of events:
 But in biological systems? Is there a central controller?
Network dynamics
Static properties of graphs:
 Examples: connectivity, degree distribution, cluster coefficient.
 Limitations:
 Static properties don't explain behavior.
 Static properties may not help with small networks.
Boolean networks: a model of network dynamics
 Start with a directed graph.
Example:
 Each vertex is in one of two states:
 "On" (gene is turned on)
 "Off" (gene is turned off)
 Use 1 for "on" and 0 for "off".
 Example: suppose only node 5 is "on":
 Example: nodes 0, 1, 2, 5 are "on":
 The state of the network is itself a number:
 In the first state above: state = 000001
=> binary value 1
 In the second state above: state = 111001
=> binary value 57
 The evolution of a boolean network:
 At each step: apply "rules" to current state to
get next state.
 Repeat.
 What are the "rules"?
 A rule specifies how inputs to a node affect the next state.
 Thus, for vertex v, suppose S_{v} is the
current state.
 Suppose v has k upstream neighbors with current states
S_{1},...,S_{k}.
 The next state for v is some function
S'_{v} = F (S_{v}, S_{1},...,S_{k}).
 Two types of commonlyused rules:
 Unweightedthreshold:
S'_{v} = 1,


if (S_{1} + ... + S_{k})  t > 0.

S'_{v} = 0,


if (S_{1} + ... + S_{k})  t < 0.

S'_{v} = S_{v},


if (S_{1} + ... + S_{k})  t == 0.

 Weightedthreshold: associate a weight W_{i} with each incoming edge
S'_{v} = 1,


if (W_{1} S_{1} + ... + W_{k} S_{k})  t > 0.

S'_{v} = 0,


if (W_{1} S_{1} + ... + W_{k} S_{k})  t < 0.

S'_{v} = S_{v},


if (W_{1} S_{1} + ... + W_{k} S_{k})  t == 0.

 Weightedthresholds model gene downregulation:
 Use positive edge weight (W_{i} = 1) for upregulation.
 Use negative edge weight (W_{i} = 11) for downregulation.
Exercise:
Consider the example from earlier:
 How many possible states are there for this graph?
 Assume t = 0 and use the unweighted rules.
 Start with the state in which only vertex 5 is "on".
What is the next state? And the state after that? And after that?
 Do the same for the case t = 1.
The state graph:
 Recall: each state of the network is a number.
 If there are n vertices, there are 2^{n}
possible states.
 The states are numbered States = {0, 1, ..., 2^{n1}}.
 Build a graph, the state graph,
with vertices {0, 1, ..., 2^{n1}}.
 For each state S in States:
 Apply rules to compute next state S' of S.
 Place an edge from S to S' in the state graph.
Exercise:
Consider this example:
 How many possible states are there for this graph?
 Assume t = 0 and use the unweighted rules.
 Compute the state graph. What do you notice?
Attractors and basins:
 In applying the rules, you can get "stuck" in a state:
=> the next state is the same.
 Such states are called attractors.
 Since there is only one outgoing edge for each state (in the
state graph), each state must either go to itself or another one.
 For an attractor state S, let B(S) be the
set of states that "lead" to it.
=> called the Basin of S.
Exercise:
What are the attractors and basins for the above 3vertex example?
An application to yeast cellcycle:
(Source: F.Li et
al. The yeast cellcycle is robustly designed).
 A simplified model of key proteins in the yeast cellcycle:
 11 vertices.
 Weightedthreshold model: t = 0.
 Green arrows: up regulation (weight = 1).
 Red arrows: down regulation (weight = 1).
 State graph has 2^{11} = 2048 vertices.
 Part of the state graph showing a major attractor:
 The results show that the most likely path is through a
sequence of states corresponding to the cell cycle: G1, S, G2, M and
back to G1.
 Significance:
 The network dynamics explain the cell cycle.
 The simple interactions of "dumb" elements in a
network can create higherlevel complexity.
Complexity
Exercise:
Consider, in general, what happens for different values of the
threshold t:
 What if t is too high?
 What if t is too low?