Say you’ve got a client and a server. The server is running SSL, so it has a certificate stored in a .pem file, which looks a bit like this:


And when the client connects, it receives the certificate and can validate it, right?

Recently I found myself in the odd position of needing to write code to view the contents of the certificate on both sides of the connection and obtain data from them. It didn’t occur to me, until now, that I know absolutely nothing about that block of data, there. I guessed it was base64 encoded, but beyond that, it was a total mystery.

It turns out that a PEM file is a base64 encoded DER document, DER being a transfer syntax for data structures described by ASN.1. That discovery left me no closer to understanding what was actually in my certificate, so let’s boil all that down and say that this is just a format for storing this stuff on disk that comes from a time when storing data was a lot harder than it is now.

What’s actually in there is your certificate - it’ll be an X.509 certificate, probably, and the Wikipedia page will cover you for the most part on what that is. Broadly, they come with a few dates and an issuer and an RSA public key that’s tied to a private key somewhere else on the server box.

I was in the position where the data I had in a C# client did not match the data I was pulling out of the file in the python server end. Specifically, I was trying to compare the public keys - C# presents you with an array of bytes when you call X509Certificate.GetPublicKey but I couldn’t get that array to line up with what I found in that pile of base64 encoded data.

After much prodding, I got to the bottom of it. I used the python M2Crypto library and binascii to convert the base64 data. Here’s the process you go through to go from PEM file to public key, as it is received on the client side.

import M2Crypto
import binascii

# Load the certificate in M2Crypto
path = '~/certificate.pem'
cert = M2Crypto.X509.load_cert(path)

# Get the public key from it, in PEM format
pem = cert.get_pubkey().get_rsa().as_pem()

# Pull the headers and footers off
key = ''.join(pem.split('\n')[1:-2])

# Convert from base64 to a string of bytes
bytes = bytearray(binascii.a2b_base64(key))

# Finally, cut the first 24 bytes off. This must be some kind of header
public_key = bytes[24:]

If you can tell me what those 24 bytes do, I’d love to know, and I’d love to know why they don’t appear on the other side. They must be part of the PEM formatting. I feel like it must be possible to avoid PEM entirely here, as it seems like a bit of a long way around, but I haven’t come across one. If you can help tie up some of these loose ends, please drop me a github pull request or issue. Thanks!