Network Stack: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
m (Reformat)
(Added details about what to do to implement a TCP/IP stack)
Line 1: Line 1:
{{Stub}}
{{Stub}}
This article is about writing a TCP/IP stack, ie. a subsystem which uses a link layer (eg. ethernet card) to process packets of such protocols as IP, [[Address Resolution Protocol|ARP]], TCP, UDP.
This article is about writing a TCP/IP stack, ie. a subsystem which uses a link layer (eg. ethernet card) to process packets of such protocols as IP, [[Address Resolution Protocol|ARP]], TCP, UDP.

Implementing one's own TCP/IP stack takes some time, but like osdev'ing in general, it is recommended to focus on one step at a time, celebrating each step, rather than considering the whole finished thing.

A tool of choice to help you will be [https://www.wireshark.org/ Wireshark], a free network sniffer and analyzer. It is a great tool to understand how the various networking protocols are encoded as it explains in great details what each byte of a packet corresponds to. Note that on Windows, Wireshark does not capture the loopback traffic (i.e. traffic made from localhost to localhost), so may not capture network traffic between an emulator and the host machine. You can however use Rawcap to capture the networking traffic into a file and use Wireshark to examine it.

== Scanning the PCI devices ==

The first thing to do is to scan the PCI devices installed on the machine so you can detect an Ethernet card by looking at a specific vendor ID and device ID. See the [[PCI]] page for more details.

== Writing a driver for your NIC ==

Once you have located the Ethernet card(s), you will need to implement a driver for it to be able to send and receive data. If you are using an emulator, a good card to write a driver for is the Intel E1000 as it is available on a variety of emulators such as VirtualBox - and has a thorough coverage on osdev.org (see [[Intel Ethernet i217]]).

The first thing to get out of the Ethernet card is the machine's MAC address. This 6-bytes address is needed to exchange data on the local network.

The easiest test you can do is to send an ARP broadcast on the network. You can use Wireshark both to capture an example of a valid ARP request and to verify it has been received by the target host. As far as receiving data, your network card should capture data sent across the local network, even if it is not addressed to your machine.

== Networking protocols ==

Once you can send and receive data through your NIC and have your machine's MAC address, you will have to implement (at least partially) several networking protocols that coexist on top of each other:

* Ethernet: this is the basic protocol that sends data to another machine on your local network using your MAC address. This is the building block for all the rest.
** ARP (Address Resolution Protocol): allows to translate an IPv4 address into a MAC address
** IP (Internet Protocol): this sits on top of Ethernet and is required to send data on the Internet given an IP address. The mostly common version is IPv4 which uses a 32-bit IP address, but IPv6 (which is using 128-bit IP addresses) is gaining some traction. Note that IP provides a "best effort" to send a packet, but does not guarantee it will successfully reach its destination, nor that the packets will be received in the order they were sent
*** UDP (User Datagram Protocol): a connectionless transmission protocol that adds the notion of source and target ports to IP. Application services can subscribe to one or more port(s) to be notified if a UDP message is sent to that port
**** DHCP (Dynamic Host Configuration Protocol): allows to request the machine network configuration information such as its IP address, the IP address of the local router, the DNS, etc.
**** DNS (Domain Name System): get the IP address for a given domain name
**** ICMP (Internet Control Message Protocol): used by tools such as ping or traceroute
*** TCP (Transmission Control Protocol): like UDP, it adds the notion of source and destination port. TCP is however more complex as it creates its own session mechanism and makes sure that the application using it will receive the packets in order, resending packets if need be.
**** HTTP (HyperText Transfer Protocol): if you don't know what HTTP is, you're on the wrong site.
**** Telnet: a protocol to remotely access a machine using a command line shell.

== The network stack ==

Networking protocols are organized as a stack where each layer calls the next layer. A packet sent across the network will be composed of several headers, one for each layer involved.

Consider the example of a DHCP request. This is one of the protocols you might want to implement early on as it allows your machine to find its IP address, get the local router IP address, the DNS IP address - the basic information to be able to properly communicate across the network.

One way to implement this is as follows:

* The Operating System decides to send a DHCP request, so calls the DHCP layer
** The DHCP layer asks the UDP layer to create a packet whose target is IP address 255,255,255,255 (broadcast to the whole local network), port 53, and whose payload size is 300 bytes (the length may vary)
*** The UDP layer asks the IP layer to create a packet of type UDP to IP address 255,255,255,255, of size 308 bytes
**** The IP layer asks the Ethernet layer to create a packet of type IPv4 of length 328 bytes whose target is IP address 255,255,255,255
***** The Ethernet layer creates a packet of size 342 bytes, and writes in the first 14 bytes the Ethernet header, including the source address (the machine's MAC address), the destination MAC address FF:FF:FF:FF:FF:FF (translated from the IP address 255,255,255,255) and sends it back to the IP layer
**** The IP layer writes the IP header in the 20 bytes after the Ethernet header and sends it to the UDP layer
*** The UDP layer writes its header in the 8 bytes after the IP header and sends it to the DHCP layer
** The DHCP layer writes its request in the 300 bytes left and sends it back to the UDP layer
*** The UDP layer completes its header by writing its checksum (which encompasses the DHCP message) and sends it to the IP layer
**** The IP layer sends it to the Ethernet layer
***** The Ethernet layer sends the packet to the Ethernet card, that sends the message across the network

The DHCP response will have the same format as the request, and should be processed as follows:
* The Ethernet card driver will verify that the target MAC is the current machine's, and if so sends the packet to the Ethernet layer
* The Ethernet layer will look at the Ethernet header, check the service type (which should be IP) and will send the packet (stripped of its Ethernet header) to the IP layer
* The IP layer will check the IP header, verify the checksum and, because its type is UDP will forward the packet (without its IP header) to the UDP layer.
* The UDP layer will check the UDP header, verify the checksum, and based on the destination port will send the payload to right service - in the is example the DHCP layer (once again stripping the UDP header)
* The DHCP layer will read the DHCP message verify that the message type is Response (i.e. It's a response from the router) and will retrieve its IP address, the router's IP address and other networking configuration information.

Note that networking protocols are by definition asynchronously i.e. you send a request on the network and you need to wait for its response. In particular, you have no way of predicting when will a response arrive, if at all. And because it an incoming packet is handled by an interrupt handler, it could interrupt your code at any time.

== Little and big Endian ==

By convention, any message encoded on the Internet is using big endian (the most significant byte goes first). This is something to always keep in mind for people developing on Intel and AMD processors as x86 processors encode numbers using little endian. As a result, you will have to often convert numbers. Here are two functions to convert the endian 16 and 32-bit integers:

uint16_t switch_endian16(uint16_t nb) {
return (nb>>8) | (nb<<8);
}
uint_t switch_endian32(uint_t nb) {
return ((nb>>24)&0xff) |
((nb<<8)&0xff0000) |
((nb>>8)&0xff00) |
((nb<<24)&0xff000000);
}

== Checksums ==

Several networking protocols use a checksum to verify that the message was not accidentally altered during the transport. The checksum is a 16-bit number computed as follows:
* split the message to checksum into 16-bit chunks
* Add those chunks
* If the sum does not fit in a 16-bit number (i.e. is greater than 0xFFFF), strip the top 16 bits and add them to the low 16 bits
* Repeat the last step until you have a 16-bit sum

The IP checksum is only for its own header. The UDP checksum is more complex as it includes the UDP header, the UDP payload (i.e. anything after the UDP header), the source and target IP addresses and the message length (starting with the UDP header)

== ARP ==

The ARP protocol will be one of the first protocols you will need to implement. Without it, you will not be able to communicate on your local network, let alone on the Internet. Fortunately this is a simple protocol which only requires to implement two functions:
* Requests: your OS will need to perform a request to convert an IP address into a MAC address, something which is required to even communicate with your local router
* Replies: a response to the requests. Your OS will need to handle replies sent to it (to update its ARP table), but also honor the requests sent its way. In particular, the local router will send an ARP requests to your machine on a regular basis. Failure to respond will have the router consider your machine is down, and won't forward any more traffic to it

== TCP ==

TCP is the most complex protocol as it creates a virtual connection between the client and the server. Each TCP packet contains multiple flags that help manage that connection.

A TCP connection is established with the following 3-way handshake:
* The client sends a SYN request to the server
* The server responds with an ACK request
* The client sends a SYN, ACK request

Each message contains a sequence number. This allows the receiver to keep track of the packets being sent, if they are received out of order or if some packets are not received at all. Likewise, the sender will expect the receiver to send ACK message on a regular basis to acknowledge that what was sent was received. It is up to the TCP layer to reorder the packets and/or send them again if need be.


== What to focus on ==
== What to focus on ==
Line 6: Line 108:
* whether or not a packet is passed between processing layers in one buffer or is copied to a new buffer when passing a layer boundary;
* whether or not a packet is passed between processing layers in one buffer or is copied to a new buffer when passing a layer boundary;
* whether in and outbound frames are communicated with the link layer with the use of a dedicated thread, are fully contained in an interrupt handler or in a loop in a single-threaded environment;
* whether in and outbound frames are communicated with the link layer with the use of a dedicated thread, are fully contained in an interrupt handler or in a loop in a single-threaded environment;
* whether frames (eg. ethernet frames) are processed immiedately or queued;
* whether frames (eg. ethernet frames) are processed immediately or queued;
* whether you want TCP support or just UDP or maybe only IP support; TCP is the most complex part of the stack, in the lwip implementation half of the code is specific to TCP.
* whether you want TCP support or just UDP or maybe only IP support; TCP is the most complex part of the stack, in the lwip implementation half of the code is specific to TCP.



Revision as of 04:29, 2 April 2016

This page is a stub.
You can help the wiki by accurately adding more contents to it.

This article is about writing a TCP/IP stack, ie. a subsystem which uses a link layer (eg. ethernet card) to process packets of such protocols as IP, ARP, TCP, UDP.

Implementing one's own TCP/IP stack takes some time, but like osdev'ing in general, it is recommended to focus on one step at a time, celebrating each step, rather than considering the whole finished thing.

A tool of choice to help you will be Wireshark, a free network sniffer and analyzer. It is a great tool to understand how the various networking protocols are encoded as it explains in great details what each byte of a packet corresponds to. Note that on Windows, Wireshark does not capture the loopback traffic (i.e. traffic made from localhost to localhost), so may not capture network traffic between an emulator and the host machine. You can however use Rawcap to capture the networking traffic into a file and use Wireshark to examine it.

Scanning the PCI devices

The first thing to do is to scan the PCI devices installed on the machine so you can detect an Ethernet card by looking at a specific vendor ID and device ID. See the PCI page for more details.

Writing a driver for your NIC

Once you have located the Ethernet card(s), you will need to implement a driver for it to be able to send and receive data. If you are using an emulator, a good card to write a driver for is the Intel E1000 as it is available on a variety of emulators such as VirtualBox - and has a thorough coverage on osdev.org (see Intel Ethernet i217).

The first thing to get out of the Ethernet card is the machine's MAC address. This 6-bytes address is needed to exchange data on the local network.

The easiest test you can do is to send an ARP broadcast on the network. You can use Wireshark both to capture an example of a valid ARP request and to verify it has been received by the target host. As far as receiving data, your network card should capture data sent across the local network, even if it is not addressed to your machine.

Networking protocols

Once you can send and receive data through your NIC and have your machine's MAC address, you will have to implement (at least partially) several networking protocols that coexist on top of each other:

  • Ethernet: this is the basic protocol that sends data to another machine on your local network using your MAC address. This is the building block for all the rest.
    • ARP (Address Resolution Protocol): allows to translate an IPv4 address into a MAC address
    • IP (Internet Protocol): this sits on top of Ethernet and is required to send data on the Internet given an IP address. The mostly common version is IPv4 which uses a 32-bit IP address, but IPv6 (which is using 128-bit IP addresses) is gaining some traction. Note that IP provides a "best effort" to send a packet, but does not guarantee it will successfully reach its destination, nor that the packets will be received in the order they were sent
      • UDP (User Datagram Protocol): a connectionless transmission protocol that adds the notion of source and target ports to IP. Application services can subscribe to one or more port(s) to be notified if a UDP message is sent to that port
        • DHCP (Dynamic Host Configuration Protocol): allows to request the machine network configuration information such as its IP address, the IP address of the local router, the DNS, etc.
        • DNS (Domain Name System): get the IP address for a given domain name
        • ICMP (Internet Control Message Protocol): used by tools such as ping or traceroute
      • TCP (Transmission Control Protocol): like UDP, it adds the notion of source and destination port. TCP is however more complex as it creates its own session mechanism and makes sure that the application using it will receive the packets in order, resending packets if need be.
        • HTTP (HyperText Transfer Protocol): if you don't know what HTTP is, you're on the wrong site.
        • Telnet: a protocol to remotely access a machine using a command line shell.

The network stack

Networking protocols are organized as a stack where each layer calls the next layer. A packet sent across the network will be composed of several headers, one for each layer involved.

Consider the example of a DHCP request. This is one of the protocols you might want to implement early on as it allows your machine to find its IP address, get the local router IP address, the DNS IP address - the basic information to be able to properly communicate across the network.

One way to implement this is as follows:

  • The Operating System decides to send a DHCP request, so calls the DHCP layer
    • The DHCP layer asks the UDP layer to create a packet whose target is IP address 255,255,255,255 (broadcast to the whole local network), port 53, and whose payload size is 300 bytes (the length may vary)
      • The UDP layer asks the IP layer to create a packet of type UDP to IP address 255,255,255,255, of size 308 bytes
        • The IP layer asks the Ethernet layer to create a packet of type IPv4 of length 328 bytes whose target is IP address 255,255,255,255
          • The Ethernet layer creates a packet of size 342 bytes, and writes in the first 14 bytes the Ethernet header, including the source address (the machine's MAC address), the destination MAC address FF:FF:FF:FF:FF:FF (translated from the IP address 255,255,255,255) and sends it back to the IP layer
        • The IP layer writes the IP header in the 20 bytes after the Ethernet header and sends it to the UDP layer
      • The UDP layer writes its header in the 8 bytes after the IP header and sends it to the DHCP layer
    • The DHCP layer writes its request in the 300 bytes left and sends it back to the UDP layer
      • The UDP layer completes its header by writing its checksum (which encompasses the DHCP message) and sends it to the IP layer
        • The IP layer sends it to the Ethernet layer
          • The Ethernet layer sends the packet to the Ethernet card, that sends the message across the network

The DHCP response will have the same format as the request, and should be processed as follows:

  • The Ethernet card driver will verify that the target MAC is the current machine's, and if so sends the packet to the Ethernet layer
  • The Ethernet layer will look at the Ethernet header, check the service type (which should be IP) and will send the packet (stripped of its Ethernet header) to the IP layer
  • The IP layer will check the IP header, verify the checksum and, because its type is UDP will forward the packet (without its IP header) to the UDP layer.
  • The UDP layer will check the UDP header, verify the checksum, and based on the destination port will send the payload to right service - in the is example the DHCP layer (once again stripping the UDP header)
  • The DHCP layer will read the DHCP message verify that the message type is Response (i.e. It's a response from the router) and will retrieve its IP address, the router's IP address and other networking configuration information.

Note that networking protocols are by definition asynchronously i.e. you send a request on the network and you need to wait for its response. In particular, you have no way of predicting when will a response arrive, if at all. And because it an incoming packet is handled by an interrupt handler, it could interrupt your code at any time.

Little and big Endian

By convention, any message encoded on the Internet is using big endian (the most significant byte goes first). This is something to always keep in mind for people developing on Intel and AMD processors as x86 processors encode numbers using little endian. As a result, you will have to often convert numbers. Here are two functions to convert the endian 16 and 32-bit integers:

   uint16_t switch_endian16(uint16_t nb) {
       return (nb>>8) | (nb<<8);
   }
   
   uint_t switch_endian32(uint_t nb) {
       return ((nb>>24)&0xff)      |
              ((nb<<8)&0xff0000)   |
              ((nb>>8)&0xff00)     |
              ((nb<<24)&0xff000000);
   }

Checksums

Several networking protocols use a checksum to verify that the message was not accidentally altered during the transport. The checksum is a 16-bit number computed as follows:

  • split the message to checksum into 16-bit chunks
  • Add those chunks
  • If the sum does not fit in a 16-bit number (i.e. is greater than 0xFFFF), strip the top 16 bits and add them to the low 16 bits
  • Repeat the last step until you have a 16-bit sum

The IP checksum is only for its own header. The UDP checksum is more complex as it includes the UDP header, the UDP payload (i.e. anything after the UDP header), the source and target IP addresses and the message length (starting with the UDP header)

ARP

The ARP protocol will be one of the first protocols you will need to implement. Without it, you will not be able to communicate on your local network, let alone on the Internet. Fortunately this is a simple protocol which only requires to implement two functions:

  • Requests: your OS will need to perform a request to convert an IP address into a MAC address, something which is required to even communicate with your local router
  • Replies: a response to the requests. Your OS will need to handle replies sent to it (to update its ARP table), but also honor the requests sent its way. In particular, the local router will send an ARP requests to your machine on a regular basis. Failure to respond will have the router consider your machine is down, and won't forward any more traffic to it

TCP

TCP is the most complex protocol as it creates a virtual connection between the client and the server. Each TCP packet contains multiple flags that help manage that connection.

A TCP connection is established with the following 3-way handshake:

  • The client sends a SYN request to the server
  • The server responds with an ACK request
  • The client sends a SYN, ACK request

Each message contains a sequence number. This allows the receiver to keep track of the packets being sent, if they are received out of order or if some packets are not received at all. Likewise, the sender will expect the receiver to send ACK message on a regular basis to acknowledge that what was sent was received. It is up to the TCP layer to reorder the packets and/or send them again if need be.

What to focus on

The shape of the stack will vary on design decisions. These may include

  • whether or not a packet is passed between processing layers in one buffer or is copied to a new buffer when passing a layer boundary;
  • whether in and outbound frames are communicated with the link layer with the use of a dedicated thread, are fully contained in an interrupt handler or in a loop in a single-threaded environment;
  • whether frames (eg. ethernet frames) are processed immediately or queued;
  • whether you want TCP support or just UDP or maybe only IP support; TCP is the most complex part of the stack, in the lwip implementation half of the code is specific to TCP.

As an example, a stack might

  • have the NIC's API provide three functions: setting up the NIC, poll for a frame and send a frame;
  • communicate in and outbound frames to the NIC in a one thread;
  • demultiplex inbound frames from a reception queue in another thread.

General considerations

  • When writing a stack over an ethernet, you may want to provide support for the ARP protocol and resolve functions.
  • For the sake of modularity, the station's IP would be better stored in an nic_info struct rather than as a global variable.
  • You may want to use Wireshark or another packet sniffer to inspect the communication and netcat which would dump debugging data sent from your os once you have udp or tcp support. Also, arping is useful when debugging arp code. You may code a trigger which for example reboots your system upon receipt of an ARP who-has for a chosen IP.
  • You may use a dedicated ethernet card on one computer connected with a crossedover cable to another computer (which runs your operating system) and use static IP. Other options include testing under bochs or qemu after implementing drivers for the network devices they provide.

See Also

Articles

Threads

External Links

A number of tcp/ip stacks come with a documentation of their implementations; it makes a good read.