The Sufec protocol

Sufec: Simple User Friendly Encrypted Chat.

Rationale

Secure messengers have been popping up like crazy in recent years. Matrix. Signal. Threema. Wire. Session. Briar. Tox. Jami. I can't even list all the ones I investigated in my search before deciding to make another. Sufec is necessary because all of these have fallen short in one or more ways:

Requiring a phone. A messenger shouldn't require any specific type of device, but especially not a phone.

Requiring a central server. A messenger should have no single point of failure or control.

Being too complex, resulting in a lack of robust implementations.

Having privacy/security flaws, such as a lack of forward secrecy and deniability.

Et cetera.

Use cases

One thing Sufec tries to do that most secure messengers don't is replace email. You can message a Sufec user without having to "send a contact request" or "invite them to a DM"; "conversations" or "rooms" are not even first-class things.

Besides the general impetus to minimize the number of applications necessary for digital life, a pressing reason to want to replace email is that it's ubiquitously used for sensitive things such as password reset codes and communication from banks and governments. That our society's standard is to send such things unencrypted is a travesty.

Sufec has no provisions for unencrypted messaging; we think that belongs in a different protocol.

What Sufec is not

Ideal for all use cases
A drop-in replacement for what you currently use
A magic spell that perfectly guarantees your notion of privacy
Feature-complete
Stable
Reviewed by experts in security and cryptography

Security model

Sufec should provide:

Authentication: when you receive a message, you have proof that it was sent by who it says.

Forward secrecy: if an attacker compromises your long-term private key, they shouldn't be able to read any messages sent before they did that.

Self-healing: if an attacker compromises ephemeral keys, they shouldn't be able to read any messages sent after they did that.

Forgeability: if an attacker compromises message plaintext, they shouldn't be able to prove anything about it to a third party, not even that a message was sent.

Metadata protection: Sufec should minimize the amount of metadata that an attacker can gather by recording network traffic, even if the attacker controls the homeservers.

Resilience: it should be difficult for an attacker to prevent communication by attacking or controlling homeservers.

Protocol

Sufec is formally a federated protocol, similar to email. A user has a *homeserver* through which they receive messages, and a long-term public/private key pair, whose public part is their ID. There are no human-readable usernames as there are in email; your Sufec address is `<id>@<server>`, not `<name>@<server>`. IDs are written in base64url with no padding.

Unless otherwise stated, all numbers are serialized in big-endian format.

Handshake

Each Sufec message is encrypted with a key derived from the long-term key pair and an ephemeral key pair from each party (this is based on the Signal handshake and is supposed to have the same properties):

Explanation of the Signal handshake

You generate an ephemeral keypair for receiving and publish the public part on your homeserver.

When you want to send someone a message, you generate an ephemeral keypair for sending, and ask their homeserver for their ephemeral public key.

You derive 3 shared secrets from the 4 keypairs: both ephemeral keys combined, and each party's ephemeral key combined with the other party's long-term key.

These three are XORed to derive one symmetric key, which is used to encrypt the message, along with a randomly generated nonce that is prepended to the ciphertext.

(This is where the handshake differs from Signal, by the simplistic use of XOR instead of a special key derivation function. I conjecture that XOR is as secure as any other way of combining them, since all inputs are secret, it's irreversible, and produces no statistical anomalies such as non-uniform probability distribution.)

Connecting to a homeserver

Every interaction with a homeserver starts by connecting to it over TCP port 49002. The server sends its public key for transport encryption, which the client handles with TOFU (trust on first use) policy: remember the public key when connecting to a server for the first time, and alert the user on subsequent connections to the same server if the key ever changes.

While reading the sections for each type of conversation you can have with a homeserver after receiving its public key, bear in mind:

All client-server communication is half-duplex.

After a session key is sent, all transmissions are encrypted with it, and each time it's used to encrypt, the nonce is incremented by one as a little-endian number, beginning at 0. The nonce is shared across client and server, meaning: if a client uses the nonce 0, and the server uses 1 and 2, the client's next transmission will use 3.

Wherever possible, things being sent at the same time are encrypted as part of the same payload.

Transmissions with a non-constant length (these will be pointed out) are preceded with a separately encrypted fixed-length (4 bytes unless otherwise stated) number telling the length of the plaintext of the following transmission, so the remote end can know how many bytes to read before trying to decrypt.

Servers respond to any error conditions (such as the client sending a message that cannot be decrypted, or running out of disk space) by closing the connection. Since there aren't really any points in this protocol where such errors can be expected, sophisticated diagnostics would not be worthwhile.

Sending a message

1. Send a byte with value 1 to indicate you are connecting to send.

2. Randomly generate a symmetric session key, anonymously encrypt it to the server's public key, and send. This doesn't authenticate the client because sending is meant to be anonymous. From this point on, all transmissions are encrypted with the session key.

3. Send the ID of the recipient.

4. The server sends the 1-byte number of devices the recipient has linked, then sends all of their ephemeral keys concatenated. If the recipient ID is not found, the server indicates 0 linked devices, and the client should abort and show an error message.

5. For each of those keys, do 6-7.

6. Encrypt the message with the key (as described in the Handshake section), then create the following concentation:

1-byte length of your address
your address
your ephemeral public key used for this message
the nonce used for the inner ciphertext
the ciphertext of the message itself

7. Anonymously encrypt this concatenation to the recipient's ID (so their homeserver can't read the metadata), then send the result. Only prepend the length for the first copy, since they must all be the same length.

8. Once the server receives such a payload for each key it gave you, it sends an arbitrary 1-byte receipt to indicate delivery was successful.

Logging in

This process is used in all of the conversations flows that require authenticating the client.

1. Encrypt your ID anonymously to the server's key and send that.

2. Do a handshake with your ID and the server's key to arrive at a symmetric session key. From this point on, all transmissions are encrypted with the session key.

Receiving messages

1. Send a byte with value 0 to indicate you are connecting to receive.

2. Log in.

3. Generate a new ephemeral keypair for receiving, and send your device ID followed by your new ephemeral key.

4. The server will send you any messages that were stored for you. Each message is sent as the anonymously encrypted concatenation as received by the server in Sending a message step 6 (preceded by its length).

5. After each message, you send back an arbitrary 1-byte receipt to indicate the receipt was successful.

The connection stays open and the server will send any new messages that arrive.

When an unrecognized client connects to receive, the server should treat this as a registration and create the mailbox.

Adding a device

To add a new device to your account, simply copy over the long-term private key and have the new device generate its own device ID. The new device can then connect to the homeserver in receiving mode and on seeing the unrecognized device ID, it will treat this as a newly linked device. From then on incoming messages will be stored until both devices have downloaded them.

Naming a device

The homeserver stores names of each device on your account visible only to you (see "Listing devices"). When a new device is added as described above, its name is empty. Alternatively, client programs may have a default name they set when you link them to your account, or may require users to set one before connecting. Any device can change its name with the following process:

1. Send a byte with value 2 to indicate you are connecting to name a device.

2. Log in.

3. Send your device ID.

4. Encrypt your desired device name anonymously to your ID (so the server can't read it).

5. Send your encrypted device name (preceded by its 1-byte length).

6. The server will send an arbitrary 1-byte confirmation to confirm the update was successful.

Listing devices

1. Send a byte with value 3 to indicate you are connecting to list devices.

2. Log in.

3. The server will send back the 1-byte number of devices on your account.

4. For each device, the server will send back its ID, and its encrypted name (preceded by its 1-byte length).

Removing a device

You can use any device that has your private key to tell your homeserver to forget about another.

1. Send a byte with value 4 to indicate you are connecting to forget a device.

2. Log in.

3. Send the device ID you want to forget.

4. The server sends back an arbitrary 1-byte receipt to indicate the device was forgotten.

This is *not* a secure response to a lost or stolen device. Such a device, since it has your private key, would be able to re-add itself to your account, and even if it couldn't, it would still be able to send on your behalf, which your homeserver has no control over. If a device with your long-term private key is lost, you should generate a new identity.

The intended use of this feature is to tell your homeserver to stop storing messages for a device you don't or won't possess anymore, for example if you have a hard drive failure, if you reinstall your operating system and forgot to back up the relevant files, or if you are wiping a device preparing to give or sell it.

Message format

A Message is serialized as, in sequence:

1-byte number of recipients other than the one reading the message
the address of each other recipient, each one preceded by its 1-byte length
timestamp
1-byte number of included hashes of previous messages sent by other group members
timestamp followed by hash of each of those
a type indicator byte
message content

If the type is 0, the message is plain text.

If the type is 1, the message is a file, and there are these additional fields before the file content:

1-byte length of the filename
filename

If the type is 2, the message is not part of a conversation but a contact sync message, sent to your other linked devices when you add or rename a contact. The message content is the 1-byte length of the contact's address, then the address, then the name. In a contact sync message, the other recipients field is meaningless, and it should be disregarded if it comes from someone other than yourself.

In a group chat, each participant should, for each received message, include a hash of it with their next outgoing message, confirming that they received it. This is necessary to prevent a group member from spoofing the recipient list, secretly sending different messages to different people. For example, if A, B, and C are in a group chat and A sends a message to B that includes C in the recipient list but doesn't actually send the message to C, or sends a different message to C, then B will notice that they never receive a hash of that message from C, and C will notice that they never receive one from B.

The exact input to hash is the serialized sender's address followed by the type indicator byte and message content.

Address format

An address is serialized as the ID followed by the homeserver name. The homeserver name can be a domain name, a dotted-decimal IPv4 address, or a bracketed colon-separated IPv6 address.

Cryptographic primitives

Sufec is based on libsodium, so:

Keys are 32-byte Curve25519 keys.
The encryption arranged by the handshake is the "box" construct (X25519, XSalsa20, Poly1305).
Anonymous encryption is the "sealed box" construct (X25519, Blake2b, XSalsa20, Poly1305).
Hash is SHA-512.

Timestamps are represented as an 8-byte number of milliseconds since the epoch (beginning of 1970).
Device IDs are arbitrary 4-byte strings.

Group chats

Group chats are implemented similarly to email: you just send your message to each recipient, and in the Message to each one, you include the list of *other* recipients. Client-side, messages with the same set of recipients can be sorted into "rooms", providing a UX not too different from what we expect from chat apps.

There is no moderation, no invite/leave/kick/ban commands, no questions about how to agree on what members are in a group.

Homeserver independence

One downside of most forms of federation is that you depend on your homeserver to interact with the network at all, and switching to a new one can be quite difficult since you need to message all your contacts and explain that you have a new address. One of the benefits of being `<id>@<server>` instead of `<name>@<server>` is that your ID is unspoofable, so when a Sufec client receives a messages from a recognized ID at a different server, it should automatically update its knowledge of that user's address (whereas in email, anybody could've made an account on a different server with the same username as your contact and tried to impersonate them). There is no need for out-of-band verification of your identity.

Peer to peer usage

Although Sufec is a federated protocol, we intend it to be usable in a peer to peer way by having a client act as its own homeserver, and having `<id>@<ip address>` as an address. This is part of the reason for the emphasis on homeserver-independence: someone can use it both ways, even in the same conversation, with minimal friction.

Linking a smartphone

Of course, one of the most convenient ways to link a phone is to have a desktop client that can show a QR code, and for every mobile client to be able to do this with every desktop client, we need a standard format for the data in the QR code. The payload should be:

The private key
The address, preceded by its 1-byte length
For each contact:
The contact's name, preceded by its 1-byte length
The contact's address, preceded by its 1-byte length

This whole payload is encoded to base64url (with no padding) and used with the "binary" QR encoding scheme. Base64 is necessary because many popular QR scanners, including zxing and zbar, don't support true binary data.

Omitted features

Let's be straight up: Sufec is not going to have all the features getting cargo-culted into every modern chat app. Matrix tried, and look how that turned out (there's only one client that actually implements all the features and it's extremely buggy and contains anti-features). Features I intend to avoid:

Editing, because the standard of sending your correction prefixed with an asterisk is adequate.
Deleting, because in any messaging system, once a message leaves your device, there is no way to enforce deletion or to know whether someone has already read it.
Any kind of profile information. Display names are client-side, as in SMS.
Audio/video calls, because this belongs in a separate protocol/app such as Jitsi.

Features I have left out for now, but might specify in the future:

Replies. Note this can be mostly replicated as a client-side feature: clients can show a "reply" button on messages that fills the textbox with a plaintext quote of the sort used in email (`> `), and treat that syntax with color or special formatting (a la qTox).
Reactions.

Omitted security and privacy properties

Note some properties one might expect that are absent:

If you use the same server as other users in a group you participate in, the server can probably figure out that you are in a group together (because it can see each member of the group that it hosts being messaged at the same time). Therefore it's still desirable to use a homeserver hosted by a trusted party even though they can't access any message content.

There is no forward secrecy on the transport layer, only on the end-to-end layer.

Unlike Signal protocol-based messengers, you don't necessarily know if your homeserver omits an incoming message from a conversation. In a group chat, you can know if you're the only one who didn't receive it because other members will send hashes of it, but you wouldn't know if no one receives it. This is considered not a big problem because without being able to see the content of messages, the homeserver wouldn't know which ones to remove.

Links

Implementations (libraries, server, clients)

Todo list

FAQ (these have not actually been asked)

Why not TLS for transport encryption?

TLS is too complicated; most of its features are not wanted here (extensive metadata on certificates, signature chains, expiration, cipher negotiation).
TLS would require an additional library dependency.

Why not encrypt the intent indicator byte?

The conversation flows are sufficiently different that it would be trivial for a network eavesdropper to tell the difference anyway.

Why not use randomly generated nonces on the transport layer?

That would allow one nonce to be used twice. A network attacker, such as an evil ISP, could record certain messages, such as the 1-byte receipt you send after downloading a message successfully, and then hijack the connection and send the same receipt after each subsequent message, emptying your mailbox without letting you download the others.

Another solution would be to send not a 1-byte receipt but a hash of the downloaded message, but that would be more expensive (computation and data size) and have no relative benefit.