Protocols: Understanding Network Software

Goals

This module is meant to give a quick high-level overview of how networks work. That is, how protocols are set up to enable communication of data between machines connected in a network.

The goal is for you to understand the core concepts in the three core network protocols: (1) datalink layer protocol, (2) network layer protocol, (3) transport layer protocol.

Before we start, let's identify a few major themes and also clarify what we are not going to cover:

Network programmming. This is something we are not going to cover here. It refers to how you write code that uses an existing network infrastructure. The most commonly used programming abstraction is a socket, for example:
```
      // Open a socket to a remote server, which initiates a
      // "connection" to that server.
      Socket socket = new Socket ("www.yahoo.com", 80);
      // Send stuff.
      PrintWriter pw = new PrintWriter (socket.getOutputStream());
      pw.println ("Hello there");
  
```
Instead, our goal is to understand how the string "Hello there" (a bunch of bits, really) gets from the application program (above) to the other machine.
What's a "protocol" anyway, and how is it different from an "algorithm"? In general, a protocol is an agreed upon set of rules to facilitate communication or to coordinate actions amongst otherwise indepedent entities. If that sounds vague, it is. Think of a protocol as implementing an "algorithm" at each network entity so that the algorithms work in concert to achieve something. So, is a protocol an algorithm? In the broadest sense, because a protocol is about computation, it is fair to say a protocol is a sort of distributed algorithm.
Major theme #1: standardization. If there weren't worldwide agreement on standardizing interfaces, we would have the same tower-of-Babel problem we now have with, say, power adapters where every manufacturer has their own cables and power specs. Luckily, there is a single standard for local networks (ethernet) and one for the internet.
Major theme #2: stuff gets lost. A lot of agonizing will go into worrying about data that gets corrupted or lost. It may be hard to believe, but it does happen. People step on cables, electronic equipment does malfunction, and wireless links can experience "electrical storms".
Major theme #3: minimize the need to know. Not for secrecy, but for simplicity. There two aspects to reducing the need to know.
- The first has to do with software engineering: if one piece of the code interacts with another, it shouldn't have to know the internals of the other, and should only interact via a well-defined interface. As we will see, the three main "pieces" are going to be arranged in a simple top-to-bottom layered fashion.
- The second has to do with complexity. We don't want a router to have to "know" the whole network, or even a substantial part of it. Ideally, routers only know their immediate neighbors. As we'll learn, actual implementations are less ideal.
Major theme #4: the Connection. One could design networks (the UDP protocol, for instance) in which an application merely dumps bits at one end of a network expecting those bits to arrive at the other end. However, modern networks go a step further in supporting a useful programming abstraction called a connection (the Socket above). Once a connection is established between two programs, one program can "pour" bits into the connection and expect those bits to be properly delivered into the receiving program.
Major theme #6: the Packet. When you send a (large) file across the internet, the file is actually broken down into smaller pieces (packets) that then wind their way independently to the destination where they are reassembled into the file. The packet is also the unit of communication between the layers of software that implement network protocols. As we'll see, a packet at a lower level of software contains a higher-level packet inside it.
Major theme #7: efficiency. A network needs to be efficient in several ways:
- It should spread the load of carrying data. If only a few cables carried everything, there could be waiting delays and a single point of failure.
- Some kinds of data (e.g., video) are more time-critical than others. A network should be smart about prioritizing.
- As much as possible of the routine maintenance should be done in parallel rather than by a few machines.
Security. These days, one expects security to be an important theme, but we are not going to discuss security. As it turns out, network security is a topic unto itself that is still evolving.

Warning:

We are going to greatly simplify matters by abstracting away detail. As a result, do not expect everything we say to be "the ground truth" about the internet. However, at the end of this module (and your assignment), you will be in a good position to delve into practical details.
One important pedagogically motivated simplification: we'll use Java (whereas most network protocol code is written in C).
Also, rather than tell you what each protocol does in detail, we'll focus on what each protocol is responsible for. This way, you can exercise creativity in implementing the functionality yourself.

Life after this module: If you want the next level of detail (or skill), we'll have some suggested-next-steps recommendations at the end.

Meet the players: the key components that make up a network

Consider two PC's that communicate across the internet and the various components involved.

Player 1: The PC. A PC has network software that usually runs as a driver inside the Operating System (OS). A PC also has a network interface card (NIC) that handles the physical transmission of bits into the portion of network that the PC is connected to.
Player 2: The ethernet. Sometimes also called a local area network. This network is usually owned by a single person (home network or office network). The hardware is typically an Ethernet switch.
Player 3: The edge router: This is typically a router owned by an organization that connects to the wider internet through an internet service provider.
Player 4: Internet routers: These are the workhorses of the internet. They are connected in what we often think of as a "network" of links (edges). They figure out how to get stuff from one end (an edge router in New York, say) to the other (an edge router in Moscow).

Packets: the basic ingredient

As mentioned earlier, large files are broken into smaller units called packets:

By way of analogy, it's as if a book that is to be mailed has its chapters torn out and mailed separately, only to be glued back into a book at the destination.
An important goal is to make this packet decomposition seamless to the applications, who shouldn't know it even happened.
Why is it done this way? In the early days, machines did not have much memory and so even a modest-sized file would easily overwhelm memory. Thus, files were broken into smaller units and sent one by one. Clearly, there's some inefficiency here since each packet has a packet header (some bookkeeping info) that we should count as overhead.

The layered approach to network protocol software

While large software packages these days are organized as a complex hierarchy of objects, network software is organized as a simple chain of objects: in layers.

The first set of networking protocols (called TCP/IP, that still dominates today) had four layers. Soon after, the International Standards Organization defined a 7-layer version that extended the 4-layer TCP/IP:

Note: We will not describe all layers here.
Layers 2-7 are generally all in software (although hardware is often used to optimize aspects of these layers).
Layer 1 is designed to be the interface to actual hardware. For convenience of discussion, we often include the hardware in this layer. It does have some software (to handle some types of garbled transmissions, for example).
Layer 1-4 are more or less equivalent to the 4-layer internet protocol that dominates today's networking. These contain what most people consider the core ideas in networking.
Layers 5-7 are often directly implemented by applications.
Layers 5-7 are still useful as abstractions because they help classify and organize the myriad of protocols in use today.
Some protocols defined above Layer 4 have now created their own special niche, such as HTTP or SOAP.

Note: in what follows, we will greatly simplify the functionality of the layers to convey core concepts.

In particular, we will condense the layers to

where the "application layer" will refer to everything higher than the transport layer.

The transport layer

Main responsibilities of the transport layer:

Provide the connection abstraction to application programs.
Handle lost packets and packets that come out of order.
Other responsiblities: slow down the rate of packet transmission if the network is too congested.

What might a transport layer API look like? Consider this simple spec:

public interface TransportLayer {

    // An application says: "I want to create a connection to destination dest"
    // The transport layer returns a connection ID that the application can use to 
    // refer to this connection in further interactions.
    public int openConnection (int dest);

    // Connection set up may take time, so an application calls this 
    // repeatedly to find out whether the connection is set up.
    public boolean isReady (int connID);

    // Send a packet of data over the connection.
    public void sendPacket (TransportPacket packet);

    // This method will be called by the layer below to tell the 
    // transport layer, "Hey, a packet has arrived for you".
    public void receivePacket (TransportPacket packet);

    // Close a particular connection when done.
    public void closeConnection (int connID);
}

An application might use the transport layer as follows (similar to how sockets are used):

  int connID = transport.openConnection (destNodeID);
  while (! transport.isReady (connID)) {
     // ... sleep a little ...
  }
  TransportPacket packet = new TransportPacket (" ... some data ...");
  transport.sendPacket (packet);

What should the transport layer do in openConnection()?

It should probably send a special packet to the destination's transport layer (its peer on the other side) saying "I need to set up a connection with you".
The destination transport layer sends back a "Sure, let's both use ID=173" for this particular connection.
Upon receiving this acknowledgement, the source transport layer then indicates that the connection is ready.

Port numbers:

What happens when an application opens two different connections to the same destination machine? For example, one is a webservice, the other is for FTP?
An additional "ID" (called a port number) is used to distinguish between multiple connections between the same source-destination pair.

Thus, the openConnection() call should really look like:

public interface TransportLayer {

    public int openConnection (int dest, int portNum);

    // ...
}

What does the sendPacket() method supposed to do?

This method should divide a large piece of data into smaller packets if needed and send them down to the network layer.
If a file is broken into multiple packets, they need to be numbered.

A transport layer's view of the network:

The transport layer only sees the layer above (application) and the layer below (network).
It has no understanding of the network itself, no understanding of how the network is connected (topology).

The network layer

Main responsibilities:

At the end host (a PC), the network layer does not really do much other than take in a transport packet, make a network packet out of it, and send that network packet down to the datalink layer.
However, the network layers on nodes inside the network do most of the core work we associate with complex networks: routing.
The network layer is the only layer that "sees the network".
To understand a network layer's functions, we'll examine the three most important functions separately.
Job #1: route packets using a routing table:
- Every node's network layer has a routing table that determines the "rule of the moment" for routing packets.
- A routing table is merely a data structure for storing some local routing information.
- A packet's journey is determined by routing tables at the nodes visited by the packet:
Job #2: maintain stats about link usage.
- Every node monitors its outgoing links and maintains some statistics about usage and performance.
- These numbers are then used in constructing the next set of routing tables.
Job #3: compute routes periodically
- As conditions change, some links get more congested than others. Yet other links are always slow.
- Accordingly, nodes engage in a distributed computation and update their routing tables.

A network layer might look like this:

public interface NetworkLayer {

    // This method is called by the transport layer above when
    // that layer has a packet to send. The packet is assumed to
    // contain information about the destination. This method is
    // only called at the end hosts.
    public void sendPacket (NetworkPacket packet);

    // The network layer receives a packet from below (datalink layer).
    // This packet could be destined for this node (an end host) or
    // needs to be forwarded along according to the dictates of the routing table.
    public void receivePacket (NetworkPacket packet);

    // This will be called by the independent process that's doing
    // the link measurement to update the stats for the link to
    // a particular neighbor. Here is where a routing computation
    // is initiated.
    public void updateLinkStatus (int neighbor, LinkStatus status);
}

The datalink layer

Main responsibilities:

A datalink layer's view is limited to only a single link.
Each datalink layer is responsible for reliable packet transmission across a link.
=> If a packet is lost or corrupted, it needs to be re-transmitted.
In the old days (and for some satellites, these days), one worried about overwhelming the node on the other side of the link. Thus, a node limited the number of packets sent without getting an acknowledgement from the other side.

Possible structure of a datalink layer:

public interface DatalinkLayer {

    // The network layer calls this to send a packet across. It's
    // assumed that the packet contains the destination (neighbor) ID.
    public void sendPacket (DatalinkPacket packet);

    // This method is called by the physical layer when a packet has come in.
    // The datalink layer sends it up to the network layer.
    public void receivePacket (DatalinkPacket packet);
}

Where to go from here

We have only provided a quick, high-level view of how network protocol software is organized and what the key ideas are.

Important network-related concepts we haven't covered:

Physical layer ideas. How these devices work, coding theory, modulation.
Higher-level protocols. For example: HTTP, Bluetooth.
Network services. For example, DNS.
Special types of networks. Wireless networks, satellite communications, ad-hoc networks.
Distributed computing: remote-procedure calls, distributed data management, network caching.
Network security. Secure communication, attacks on networks, authentication.

Next steps:

Understand the protocols. The best way is to write your own protocols. Note that we haven't said anything about how their functionality is implemented.
Learn more about how the 7 layers work (conceptually). There are several "next level" details to learn about current protocols, once you've had a crack at designing your own. Two good books to try are:
- Computer Networks by A.S.Tanenbaum, Prentice-Hall, 2002.
- Computer Networking: A Top-Down Approach, by J.F.Kurose and K.W.Ross, Addison-Wesley, 2009.
Examine code. Read implementations of the layers. There are also some books that walk through TCP/IP code, all of which is written in C.