Protocols: Understanding Network Software
Goals
This module is meant to give a quick high-level overview of how
networks work. That is, how protocols are set up to enable
communication of data between machines connected in a network.
The goal is for you to understand the core concepts in the
three core network protocols: (1) datalink layer protocol,
(2) network layer protocol, (3) transport layer protocol.
Before we start, let's identify a few major themes and
also clarify what we are not going to cover:
- Network programmming. This is something we are
not going to cover here. It refers to how you write code
that uses an existing network infrastructure.
The most commonly used programming abstraction is a
socket, for example:
// Open a socket to a remote server, which initiates a
// "connection" to that server.
Socket socket = new Socket ("www.yahoo.com", 80);
// Send stuff.
PrintWriter pw = new PrintWriter (socket.getOutputStream());
pw.println ("Hello there");
Instead, our goal is to understand how the string
"Hello there" (a bunch of bits, really)
gets from the application program (above) to the other machine.
- What's a "protocol" anyway, and how is it different
from an "algorithm"?
In general, a protocol is an agreed upon set of rules to facilitate
communication or to coordinate actions amongst otherwise indepedent
entities. If that sounds vague, it is. Think of a protocol as
implementing an "algorithm" at each network entity so that
the algorithms work in concert to achieve something.
So, is a protocol an algorithm? In the broadest sense, because
a protocol is about computation, it is
fair to say a protocol is a sort of distributed algorithm.
- Major theme #1: standardization.
If there weren't worldwide agreement on standardizing interfaces,
we would have the same tower-of-Babel problem we now have
with, say, power adapters where every manufacturer has their own
cables and power specs. Luckily, there is a single standard
for local networks (ethernet) and one for the internet.
- Major theme #2: stuff gets lost. A lot of agonizing
will go into worrying about data that gets corrupted or lost.
It may be hard to believe, but it does happen. People step
on cables, electronic equipment does malfunction, and wireless
links can experience "electrical storms".
- Major theme #3: minimize the need to know. Not for
secrecy, but for simplicity. There two aspects to reducing
the need to know.
- The first has to do with software engineering: if one
piece of the code interacts with another, it shouldn't have
to know the internals of the other, and should only interact
via a well-defined interface. As we will see, the three
main "pieces" are going to be arranged in a simple top-to-bottom
layered fashion.
- The second has to do with complexity. We don't want
a router to have to "know" the whole network, or even a
substantial part of it. Ideally, routers only know their
immediate neighbors. As we'll learn, actual implementations
are less ideal.
- Major theme #4: the Connection. One could design
networks (the UDP protocol, for instance) in which an application
merely dumps bits at one end of a network expecting those
bits to arrive at the other end. However, modern networks
go a step further in supporting a useful programming abstraction
called a connection (the Socket above).
Once a connection is established between two programs,
one program can "pour" bits into the connection and expect
those bits to be properly delivered into the receiving program.
- Major theme #6: the Packet.
When you send a (large) file across the internet, the file is
actually broken down into smaller pieces (packets) that then
wind their way independently to the destination where they
are reassembled into the file. The packet is also
the unit of communication between the layers of software
that implement network protocols. As we'll see, a packet
at a lower level of software contains a higher-level packet inside it.
- Major theme #7: efficiency.
A network needs to be efficient in several ways:
- It should spread the load of carrying data. If only a few
cables carried everything, there could be waiting delays and
a single point of failure.
- Some kinds of data (e.g., video) are more time-critical
than others. A network should be smart about prioritizing.
- As much as possible of the routine maintenance should be
done in parallel rather than by a few machines.
- Security. These days, one expects security to be
an important theme, but we are not going to discuss security.
As it turns out, network security is a topic unto itself that
is still evolving.
Warning:
-
We are going to greatly simplify matters by abstracting
away detail. As a result, do not expect everything we say to
be "the ground truth" about the internet. However, at the end
of this module (and your assignment), you will be in a good
position to delve into practical details.
- One important pedagogically motivated simplification: we'll use
Java (whereas most network protocol code is written in C).
- Also, rather than tell you what each protocol does in detail,
we'll focus on what each protocol is responsible for. This way,
you can exercise creativity in implementing the functionality yourself.
Life after this module: If you want the next level of
detail (or skill), we'll have some suggested-next-steps
recommendations at the end.
Meet the players: the key components that make up a network
Consider two PC's that communicate across the internet
and the various components involved.
- Player 1: The PC. A PC has network software that
usually runs as a driver inside the Operating System (OS).
A PC also has a network interface card (NIC) that handles
the physical transmission of bits into the portion of network
that the PC is connected to.
- Player 2: The ethernet. Sometimes also called
a local area network. This network is usually owned by a single
person (home network or office network). The hardware is typically
an Ethernet switch.
- Player 3: The edge router: This is typically a
router owned by an organization that connects to the wider internet
through an internet service provider.
- Player 4: Internet routers: These are the
workhorses of the internet. They are connected in what we
often think of as a "network" of links (edges). They figure out
how to get stuff from one end (an edge router in New York, say)
to the other (an edge router in Moscow).
Packets: the basic ingredient
As mentioned earlier, large files are broken into smaller units
called packets:
- By way of analogy, it's as if a book that is to be mailed
has its chapters torn out and mailed separately, only to be
glued back into a book at the destination.
- An important goal is to make this packet decomposition
seamless to the applications, who shouldn't know it even happened.
- Why is it done this way? In the early days, machines did not
have much memory and so even a modest-sized file would easily
overwhelm memory. Thus, files were broken into smaller units
and sent one by one. Clearly, there's some inefficiency here
since each packet has a packet header (some bookkeeping info)
that we should count as overhead.
The layered approach to network protocol software
While large software packages these days are organized as
a complex hierarchy of objects, network software is organized
as a simple chain of objects: in layers.
The first set of networking protocols (called TCP/IP, that still dominates today)
had four layers. Soon after, the International Standards Organization
defined a 7-layer version that extended the 4-layer TCP/IP:
- Note: We will not describe all layers here.
- Layers 2-7 are generally all in software (although
hardware is often used to optimize aspects of these layers).
- Layer 1 is designed to be the interface to actual hardware.
For convenience of discussion, we often include the hardware in this layer.
It does have some software (to handle some types of
garbled transmissions, for example).
- Layer 1-4 are more or less equivalent to the 4-layer
internet protocol that dominates today's networking. These contain
what most people consider the core ideas in networking.
- Layers 5-7 are often directly implemented by applications.
- Layers 5-7 are still useful as abstractions because they
help classify and organize the myriad of protocols in use today.
- Some protocols defined above Layer 4 have now created
their own special niche, such as HTTP or SOAP.
Note: in what follows, we will greatly simplify the functionality
of the layers to convey core concepts.
In particular, we will condense the layers to
where the "application layer" will refer to everything higher
than the transport layer.
The transport layer
Main responsibilities of the transport layer:
- Provide the connection abstraction to application
programs.
- Handle lost packets and packets that come out of order.
- Other responsiblities: slow down the rate of packet
transmission if the network is too congested.
What might a transport layer API look like? Consider this simple
spec:
public interface TransportLayer {
// An application says: "I want to create a connection to destination dest"
// The transport layer returns a connection ID that the application can use to
// refer to this connection in further interactions.
public int openConnection (int dest);
// Connection set up may take time, so an application calls this
// repeatedly to find out whether the connection is set up.
public boolean isReady (int connID);
// Send a packet of data over the connection.
public void sendPacket (TransportPacket packet);
// This method will be called by the layer below to tell the
// transport layer, "Hey, a packet has arrived for you".
public void receivePacket (TransportPacket packet);
// Close a particular connection when done.
public void closeConnection (int connID);
}
An application might use the transport layer as follows (similar
to how sockets are used):
int connID = transport.openConnection (destNodeID);
while (! transport.isReady (connID)) {
// ... sleep a little ...
}
TransportPacket packet = new TransportPacket (" ... some data ...");
transport.sendPacket (packet);
What should the transport layer do in openConnection()?
- It should probably send a special packet to the destination's
transport layer (its peer on the other side) saying "I need to
set up a connection with you".
- The destination transport layer sends back a "Sure, let's both
use ID=173" for this particular connection.
- Upon receiving this acknowledgement, the source transport
layer then indicates that the connection is ready.
Port numbers:
- What happens when an application opens two different
connections to the same destination machine? For example,
one is a webservice, the other is for FTP?
- An additional "ID" (called a port number) is used
to distinguish between multiple connections between the
same source-destination pair.
- Thus, the openConnection() call should really look
like:
public interface TransportLayer {
public int openConnection (int dest, int portNum);
// ...
}
What does the sendPacket() method supposed to do?
- This method should divide a large piece of data into smaller
packets if needed and send them down to the network layer.
- If a file is broken into multiple packets, they need to be numbered.
A transport layer's view of the network:
- The transport layer only sees the layer above (application)
and the layer below (network).
- It has no understanding of the network itself, no
understanding of how the network is connected (topology).
The network layer
Main responsibilities:
- At the end host (a PC), the network layer does not really do
much other than take in a transport packet, make a network packet
out of it, and send that network packet down to the datalink layer.
- However, the network layers on nodes inside the network do
most of the core work we associate with complex networks: routing.
- The network layer is the only layer that "sees the network".
- To understand a network layer's functions, we'll examine
the three most important functions separately.
- Job #1: route packets using a routing table:
- Every node's network layer has a routing table that
determines the "rule of the moment" for routing packets.
- A routing table is merely a data structure for
storing some local routing information.
- A packet's journey is determined by routing tables at the
nodes visited by the packet:
- Job #2: maintain stats about link usage.
- Every node monitors its outgoing links and maintains
some statistics about usage and performance.
- These numbers are then used in constructing the next
set of routing tables.
- Job #3: compute routes periodically
- As conditions change, some links get more congested than
others. Yet other links are always slow.
- Accordingly, nodes engage in a distributed computation
and update their routing tables.
A network layer might look like this:
public interface NetworkLayer {
// This method is called by the transport layer above when
// that layer has a packet to send. The packet is assumed to
// contain information about the destination. This method is
// only called at the end hosts.
public void sendPacket (NetworkPacket packet);
// The network layer receives a packet from below (datalink layer).
// This packet could be destined for this node (an end host) or
// needs to be forwarded along according to the dictates of the routing table.
public void receivePacket (NetworkPacket packet);
// This will be called by the independent process that's doing
// the link measurement to update the stats for the link to
// a particular neighbor. Here is where a routing computation
// is initiated.
public void updateLinkStatus (int neighbor, LinkStatus status);
}
The datalink layer
Main responsibilities:
- A datalink layer's view is limited to only a single link.
- Each datalink layer is responsible for reliable packet
transmission across a link.
=> If a packet is lost or corrupted, it needs to be re-transmitted.
- In the old days (and for some satellites, these days), one
worried about overwhelming the node on the other side of the link.
Thus, a node limited the number of packets sent without getting
an acknowledgement from the other side.
Possible structure of a datalink layer:
public interface DatalinkLayer {
// The network layer calls this to send a packet across. It's
// assumed that the packet contains the destination (neighbor) ID.
public void sendPacket (DatalinkPacket packet);
// This method is called by the physical layer when a packet has come in.
// The datalink layer sends it up to the network layer.
public void receivePacket (DatalinkPacket packet);
}
Where to go from here
We have only provided a quick, high-level view of how network protocol
software is organized and what the key ideas are.
Important network-related concepts we haven't covered:
- Physical layer ideas. How these devices work,
coding theory, modulation.
- Higher-level protocols. For example: HTTP, Bluetooth.
- Network services. For example, DNS.
- Special types of networks. Wireless networks,
satellite communications, ad-hoc networks.
- Distributed computing: remote-procedure calls,
distributed data management, network caching.
- Network security. Secure communication,
attacks on networks, authentication.
Next steps:
- Understand the protocols. The best way is to write
your own protocols. Note that we haven't said anything about
how their functionality is implemented.
- Learn more about how the 7 layers work (conceptually). There are several
"next level" details to learn about current protocols, once
you've had a crack at designing your own. Two good books to try
are:
- Computer Networks by A.S.Tanenbaum, Prentice-Hall, 2002.
- Computer Networking: A Top-Down Approach, by
J.F.Kurose and K.W.Ross, Addison-Wesley, 2009.
- Examine code. Read implementations of the
layers. There are also some books that walk through TCP/IP code,
all of which is written in C.