SSL/TLS

From OSDev.wiki
Jump to navigation Jump to search
This page is a work in progress.
This page may thus be incomplete. Its content may be changed in the near future.

SSL/TLS is a protocol used to ensure a secure connection in various standard networking protocols (HTTP, FTP, etc.). Even though people talk about SSL, this protocol has been since mostly replaced with TLS (versions 1.0, 1.1 or 1.2). SSL should not be used anymore as it is not considered secure.

In order to setup an HTTPS connection, SSL/TLS is used between TCP and HTTP. In other word, the HTTP command sent by the Web browser and the HTML returned by the server are encrypted using SSL/TLS.

WARNING: implementing your own TLS layer is no guarantee of security. It is indeed recommended to never even write your own implementation of known, secure cryptographic algorithms as multiple attacks have been known to exploit some faults in the implementation. Writing your own TLS layer is however useful if you want to understand how SSL/TLS works and/or if you want to access Websites which are only available through HTTPS.

There are a few tools that can assist you when developing your own TLS layer. First of all, Wireshark is a free tool that captures network traffic and explains in details how the different packets are composed, down to the signification of each byte (save the encrypted parts). Also, Python can an invaluable tool to prototype and verify your cryptographic algorithms (you might want to write a prototype of a TLS connection in Python first). Python indeed natively supports very large integers (e.g. 1024-bit ones), and it has several libraries for things it cannot do out of the box such as PyCrypto which contains most cryptographic primitives required or Scapy SSL that allows you to forge SSL packets. These two tools can greatly help testing how TLS works.

Cryptography

A SSL/TLS connection is actually using a whole set of cryptographic algorithms called a cipher suite. On top of that, SSL/TLS does not support one but multiple cipher suites. An SSL/TLS connection might use a completely different cipher suite depending on what the client and server support. Fully supporting TLS would actually require to implement a whole series of cipher suites. Fortunately, implementing only a few popular cipher suites is enough for most cases. You can use SSL Labs SSL test to check what cipher suites are supported by various Web servers.

Cryptography recap

Here are the main types of cryptographic algorithms:

Public/private key Secret key No key
Encryption Asymmetric encryption Symmetric encryption
Verification Signing Message Authentication Cipher Cryptographic hash
  • Asymmetric encryption (e.g. RSA): one party generates a private/public key pair and makes the public key readily available. Anybody can encrypt data using that public key, but only the owner of the private key can decrypt it
  • Symmetric encryption (e.g. AES): both parties need to use a shared secret key to encrypt and decrypt data
  • Signing (RSA): one party generates a private/public key pair and makes the public key readily available. Only the owner of the private key can sign data, but anybody with the public key can verify that the signature matches the data
  • Message Authentication Cipher aka MAC (e.g. HMAC): generates a signature using a secret key
  • Cryptographic hash (e.g. SHA1, SHA256): generates a signature of some data, but it is very hard to find another data that would generate the same signature

How TLS uses those

We will study how things work with TLS version 1.2 using the TLS_DHE_RSA_AES_128_CBC_SHA cipher suite. This cipher suite indicates the algorithm used for the key exchange (DHE, using RSA for verification), for the actual encryption/decryption (AES 128-bit in CBC mode) and verification (HMAC+SHA1). This cipher suite thus requires to implement the following:

  • The Diffie-Hellman Ephemeral (DHE) key exchange protocol. This protocols relies on modular exponentiation over very large numbers, although it is possible to get past it if security is not your primary goal
  • Encryption and decryption using AES 128-bit in CBC mode
  • The SHA1 and SHA256 cryptographic hashing algorithm
  • HMAC, a Message Authentication Code (MAC). A MAC is similar to a cryptographic hash function except that it requires a secret key
  • Optional: if you want to verify the server certificate, you will need to implement the RSA algorithm, which also relies on modular exponentiation as well as SHA1/SHA256/SHA384 (depending on the certificate chain)

Note that you can easily find on the Internet source code for AES, SHA1, SHA256 and HMAC.

This cipher suite is not the strongest available, but is still relatively popular and shows the key mechanisms of a secure TLS interaction. Another cipher suite useful to implement is TLS_RSA_AES_128_CBC_SHA. The only difference is that the key exchange is using RSA instead of Diffie-Hellman. People interested in implementing a stronger suite can look at TLS_ECDHE_RSA_AES_128_GCM which requires to implement the Elliptic Curve version of Diffie-Hellman as well as the Galois Counter Mode (GCM) instead of the easier-to-implement CBC mode.

Aside from the cipher suite, TLS defines its own PRF (Pseudo-Random Function) which is used to generate pseudo-random data. Here is an implementation example in Python:

def HMAC_hash(secret, val):
    h = HMAC.new(secret, digestmod=SHA256)
    h.update(val)
    return h.digest()

def P_hash(secret, seed, size):
    A = seed
    result = ''
    while size > 0:
        A = HMAC_hash(secret, A)
        result += HMAC_hash(secret, A+seed)
        size -= 32
        
    return result

def PRF(secret, label, seed, size):
    return P_hash(secret, label+seed, size)[0:size]

The PRF here will generate [size] bytes of pseudo-random data based on the secret, the label and the seed.

Note the use of SHA256, even though the cipher suite specifies SHA1. TLS 1.2 requires at least SHA256 for its PRF (SHA384 if the cipher suite is using SHA384).

Handshake

Any communication in TLS starts with a 5-byte TLS Record header:

typedef struct __attribute__((packed)) {
	uint8_t content_type;
	uint16_t version;
	uint16_t length;
} TLSRecord;

This header may be followed by another TLS header, such as a TLS Handshake header. Like for a TCP connection, a TLS connection starts with a handshake between the client and the server:

  • The client sends a Client Hello message, including a list of 32-byte list of random data and the list of its supported cipher suites. In our example we only send one supported cipher suite (code 0x0033)
  • The server responds with a Server Hello message, telling the client what cipher suite is going to be used as well as its own 32-byte list of random data
  • The server sends its certificates. These are used by the client to verify that it is actually talking to the site it thinks it is talking to, as opposed to a malicious site
  • The server sends a Server Key Exchange message, initiating the key exchange and signing it with its public key
  • The server sends a Server Hello Done message, indicating it is waiting for the client
  • The client sends a Client Key Exchange message, containing its part of the key exchange transaction
  • The client sends a Change Cipher Spec message
  • The client sends a Encrypted Handshake Message
  • The server sends a Change Cipher Spec
  • The server sends a Encrypted Handshake Message
  • The client and the server can communicate by exchanging encrypted Application Data messages

The Change Cipher Spec message tells the other party its is OK with the terms of the handshake.

The Encrypted Handshake messages are the first ones to be sent encrypted. They contain a hash of the initial handshake messages and are here to ensure these were not tampered with.

Any subsequent communication is of type Application Data and encrypted.

Key Exchange

TLS encryption is performed using symmetric encryption. The client and server thus need to agree on a secret key. This is done in the key exchange protocol.

In our example, TLS is using the DHE/RSA algorithms: the Diffie-Hellman Ephemeral protocol is used to come up with the secret key, and the server is using the RSA protocol to sign the numbers it sends to the client (the signature is linked to its SSL certificate) to ensure that a third party cannot inject a malicious number. The upside of DHE is that it is using a temporary key that will be discarded afterwards. Key exchange protocols such as DH or RSA are using numbers from the SSL certificate. As a result, a leak of the server's private key (for example through Heartbleed) means that a previously recorded SSL/TLS encryption can be decrypted. Ephemeral key exchange protocols such as DHE or ECDHE offer so-called forward secrecy and are safe even if the server's private key is later compromised.

Diffie-Hellman Ephemeral works as follows:

  • The server comes up with a secret number y, with a number g and a modulo p (p typically being a 1024 bit integer) and sends (p, g, pubKey=gy mod p) to the client in its "Server Key Exchange" message. It also sends a signature of the Diffie-Hellman parameters (see SSL Certificate section)
  • The client comes up with a secret number x and sends pubKey=gx mod p to the server in its "Client Key Exchange" message
  • The client and server derive a common key premaster_secret = (gx)y mod p = (gy)x mod p = gxy mod p. If p is large enough, it is extremely hard for anyone knowing only gx and gy (which were transmitted in clear) to find that key.

Because computing gxy mod p using 1024-bytes integers can be tedious in most programming languages, if security is not a concern, one way to avoid this is to use x=1. This way, premaster_secret is just gy mod p, a value directly sent by the server. The security in such a case is of course compromised.

premaster_key is however only a first step. Both client and server uses the PRF function to come up with a 48-byte master secret. The PRF function is used once again to generate a 104-bytes series of data which will represent all the secret keys used in the conversation (the length may differ depending on the cipher suite used):

# g_y, g and p are provided in the Server Key Exchange message
# The client determines x
premaster_secret = pow(g_y, x, p)

# client_random and sever_random are the 32-bytes random data from the Client Hello and Server Hello messages
master_secret = PRF(premaster_secret, "master secret", client_random + server_random, 48)
keys = PRF(master_secret, "key expansion", server_random + client_random, 104)

# The MAC keys are 20 bytes because we are using HMAC+SHA1
client_write_MAC_key = keys[0:20]
server_write_MAC_key = keys[20:40]
# The client and server keys are 16 bytes because we are using AES 128-bit aka a 128 bit = 16 bytes key
client_write_key = keys[40:56]
server_write_key = keys[56:72]
# The IVs are always 16 bytes because AES encrypts blocks of 16 bytes
client_write_IV = keys[72:88]
server_write_IV = keys[88:104]

Note how different secret keys are used for the client and for the server, as well as for encryption and to compute the MAC.

Another Key Exchange: Elliptical Curve Diffie Hellman

If Diffie-Hellman is a very powerful algorithm, it requires very large numbers to be considered secure (1024-bit at minimum). A variant is Elliptical Curve Diffie-Hellman, which is much harder to break even with 256-bit numbers. Numerous TLS cipher suites now rely on the ECDHE_RSA key exchange instead of DHE_RSA, like in the TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA cipher suite.

ECDH works as follows: consider a point Q = (x, y) on a curve y2 = x3 + a.x + b mod p. Both parties come up with secret numbers d1 and d2, and will send each other d1.Q and d2.Q (d1.Q means adding Q to itself d1 times). The shared secret key is d1.d2.Q.

TLS can use the ECDHE key exchange to come up with an ephemeral shared secret key the following way:

  • The server indicates in the Server Key Exchange message what type of curve is going to be used (secp256r1 is a very common one). This tells what parameters a, b, p and G to use (see [1] to see the domain parameters for each curve)
  • The server comes up with a random 256-bit number (or whatever the curve says) server_secret and sends pubKey = G*server_secret in the Server Key Exchange message. pubKey is sent as a 65-bytes block composed of 0x04 | Gx | Gy (both numbers being 32-bytes long)
  • The client comes up with a random 256-bit number client_secret and sends pubKey = G*client_secret in the Client Key Exchange message. pubKey is sent in the same format as the server's
  • Both parties will derive premaster_secret by computing server_pubKey * client_secret = client_pubKey * server_secret = G * client_secret * server_secret and taking the x coordinate of this result
  • Once premaster_secret is determined, the rest of the computation works the same regardless of the key exchange protocol used

Regarding how to compute elliptic curve point multiplication, Wikipedia offers more details. Note that, because we are only dealing with large integers, you should use modular multiplicative inverse instead of divisions.

If you want to test Elliptic Curves in Python, TinyEC is a very useful package (along with the source code in pure Python):

import tinyec.ec as ec
import tinyec.registry as reg

# Get the domain parameters for the named curve specified in the Server Key Exchange message
curve = reg.get_curve("secp256r1")

# Comes up with a random 256-bit (32 bytes) client_secret
# curve.g is a point on the elliptic curve, defined by the domain parameters
# We multiply it with client_secret to obtain the public key
client_pubKey = curve.g * client_secret
# Retrieved from the Server Key Exchange message
server_pubKey = ...

premaster_secret = (server_pubKey * client_secret).x

SSL Certificate (optional)

In order to prevent a Man-In-The-Middle attack (MITM), the server will sign the Diffie-Hellman parameters it sent to the client. Because the client may have never contacted the server before (and thus cannot securely obtain its public key), the client and server rely on a trusted third party known as a Certificate Authority (CA).

In order to verify the signature using the RSA algorithm, the client need to do the following:

  • Retrieve the Certificate message sent by the server, which contains one or more certificates (look at a such a packet in Wireshark)
  • Verify that the first certificate's RDN sequence (signedCertificate/subject:rdnSequence/rdnSequence) contains the Web site the client is trying to contact
  • Get the RSA e and n values from the first certificate's public key (signedCertificate/subjectPublicKeyInfo/subjectPublicKey). Those parameters are encoded using the ASN.1 format (as a verification, e is very often 65537, or 0x10001)
  • Compute the hash of the whole DH parameters (as sent by the server) preceded with the client and server random data. The certificate indicates what type of hash to use (signedCertificate/subjectPublicKeyInfo/algorithm):
  • Compute signaturee mod n, convert it to a string and take the last 20 bytes
  • Both computations should be the same
  • Because this certificate is probably generated by an intermediate CA, the client needs to verify that certificate
  • Compute the hash of the whole signedCertificate section and repeat the operation using the next certificate
  • Follow the certificate chain up to the end. The last certificate should belong to a root CA (any TLS implementation should contain a list of the root CAs and their public key) and is self-signed

Encrypted Handshake Message

The TLS handshake is concluded with the two parties sending a hash of the complete handshake exchange, in order to ensure that a middleman did not try to conduct a downgrade attack.

If your TLS client technically does not have to verify the Encrypted Handshake Message sent by the server, it needs to send a valid Encrypted Handshake Message of its own, otherwise the server will abort the TLS session.

Here is what the client needs to do to create :

  • Compute a SHA256 hash of a concatenation of all the handshake communications (or SHA384 if the PRF is based on SHA384). This means the Client Hello, Server Hello, Certificate, Server Key Exchange, Server Hello Done and Client Key Exchange messages. Note that you should concatenate only the handshake part of each TLS message (i.e. strip the first 5 bytes belonging to the TLS Record header)
  • Compute PRF(master_secret, "client finished", hash, 12) which will generate a 12-bytes hash
  • Append the following header which indicates the hash is 12 bytes: 0x14 0x00 0x00 0x0C
  • Encrypt the 0x14 0x00 0x00 0x0C | [12-bytes hash] (see the Encrypting / Decrypting data section). This will generate a 64-bytes ciphertext
  • Send this ciphertext wrapped in a TLS Record

The server will use a similar algorithm, with two notable differences:

  • It needs to compute a hash of the same handshake communications as the client as well as the decrypted "Encrypted Handshake Message" message sent by the client (i.e. the 16-bytes hash starting with 0x1400000C)
  • It will call PRF(master_secret, "server finished", hash, 12)

Encrypting / Decrypting data

Any encrypted data in this example is using AES 128-bit in CBC mode. AES encrypts 128-bit (16 bytes) blocks of data using a 128, 192 or 256-bit secret key. The CBC mode tells how to use AES to encrypt some plaintext which is not 16-bytes long.

The following steps needs to be implemented:

  • Create an intermediary plaintext which concatenates:
    • The 8-bytes sequence number. This number is 0 for handshake messages, 1 for the first application data message, 2 for the next application data message, etc.
    • The 1-byte content type (0x16 for a handshake message, 0x17 for an application data message)
    • The TLS version (0x0303)
    • The 2-bytes plaintext length
    • The original plaintext
  • Compute the MAC on that intermediary plaintext using HMAC+SHA1 and the client/server_write_MAC_key
  • The final plaintext will be the concatenation of [original plaintext] + [20-bytes MAC] + [CBC padding]. Because AES-CBC only encrypts data whose size is a multiple of 16, the CBC padding is composed of bytes to fill to 16 (16 full bytes if the plaintext size is already a multiple of 16). The value of each of those padding bytes is the length of the padding + 1. So in the case of a 16-bytes plaintext, the final plaintext would be [16-bytes plaintext] | [20 bytes MAC] | 0x0B0B0B0B0B0B0B0B0B0B0B0B
  • Come up with a random 16-bytes IV (or you can use client/server_write_IV)
  • Encrypt the final plaintext using the client/server_write_key and this IV
  • The ciphertext is the concatenation of IV + ciphertext
from Crypto.Hash import *
from Crypto.Cipher import AES

def to_n_bytes(number, size):
	h = '%x' % number
	s = ('0'*(size*2 - len(h)) + h).decode('hex')
	return s

def encrypt(plaintext, iv, key_AES, key_MAC, seq_num, content_type):
    hmac = HMAC.new(key_MAC, digestmod=SHA)
    plaintext_to_mac = to_n_bytes(seq_num, 8) + to_n_bytes(content_type, 1) + '\x03\x03' + to_n_bytes(len(plaintext), 2) + plaintext
    hmac.update(plaintext_to_mac)
    mac_computed = hmac.digest()

    cipher = AES.new(key_AES, AES.MODE_CBC, iv)
    plaintext += mac_computed
    padding_length = 16 - (len(plaintext) % 16)
    if padding_length == 0:
        padding_length = 16

    padding = chr(padding_length - 1) * padding_length
    ciphertext = cipher.encrypt(plaintext + padding)

    return ciphertext

def decrypt(message, key_AES, key_MAC, seq_num, content_type, debug=False):
    iv = message[0:16]
    cipher = AES.new(key_AES, AES.MODE_CBC, iv)
    decoded = cipher.decrypt(message[16:])

    padding = to_int(decoded[-1:]) + 1
    plaintext = decoded[0:-padding-20]
    mac_decrypted = decoded[-padding-20:-padding]

    hmac = HMAC.new(key_MAC, digestmod=SHA)
    plaintext_to_mac = to_n_bytes(seq_num, 8) + to_n_bytes(content_type, 1) + '\x03\x03' + to_n_bytes(len(plaintext), 2) + plaintext
    hmac.update(plaintext_to_mac)
    mac_computed = hmac.digest()

    if debug:
        print('Decrypted: [' + decoded.encode('hex') + ']')
        print('Plaintext: [' + plaintext.encode('hex') + ']')
        print('MAC (decrypted): ' + to_hex(mac_decrypted))
        print('MAC (computed):  ' + to_hex(mac_computed))
        print('')

    return plaintext