-*- markdown -*-

Overview
========

This document describes the design of catlfish, an implementation of a
Certificate Transparency (RFC6962) log server.



    +------------------------------------------------+
    |                front end nodes                 |
    +------------------------------------------------+
         ^                |                  |
         |                v                  v
         |        +---------------+  +---------------+
         |        | storage nodes |  | signing nodes |
         |        +---------------+  +---------------+
         |                ^                  ^
         |                |                  |
    +------------------------------------------------+
    |               primary merge node               |
    +------------------------------------------------+
         ^                         |
         |                         v
         |        +----------------------------------+
         |        |      secondary merge nodes       |
         |        +----------------------------------+
         |           ^
         |           |
    +-------------------+
    | merge-repair node |
    +-------------------+



Design assumptions
------------------
* The database grows with 5 GB per year, based on 5,000 3 kB
  submissions per day
* Max size is 300 GB, based on 100e6 certificates
* submissions: less than 0.1 qps, based on 5,000 submissions per day
* monitors: 6 qps, based on 100 monitors
* auditors: 8,000 qps, based on 2.5e9 browsers visiting 100 sites
  (with a 1y certificate) per month (assuming a single combined
  request for doing get-sth + get-sth-consistency + get-proof-by-hash)


Terminology
===========

CC = Certificate Chain
CT log = Certificate Transparency log


Front-end node
==============

* Handles all http requests.
* Has a complete copy of the published data locally.
* Read requests are answered directly by reading local files
  and calculating the answers.
* Add requests are validated and then sent to all storage
  nodes. At the same time, a signing request is sent to one or
  more of the signing nodes. When responses have been received
  from a predetermined number of storage nodes and one signing
  response has been received, a response is sent to the client.
* Has an inward-facing API with the entry points SendLog(Hashes),
  MissingEntries() (returns a list of hashes), SendEntry(Entry),
  SendSTH(STH), CurrentPosition().


Storage node
============

* Stores certificate chains and SCTs.
* Has a write API SendEntry(Entry) that stores the certificate chain
  in a database, indexed by its hash. Then stores the hash in a list
  NewEntries.
* Takes reasonable measures to ensure that data is in permanent
  storage before sending a response.
* When seeing a new STH, moves the variable start to the index of the
  first unpublished hash.
* Has two read API entry points; FetchNewEntries() which returns
  NewEntries[start...length(NewEntries)-1] and GetEntry(Hashes) which
  returns entries.


Signing node
============

* Has the signing key for the log or talks to an HSM.


Merging node
============

* The primary merge node is determined by configuration.
* The other merging nodes are called "secondaries".
* The primary merge node has two phases, merging and distributing.

Merging (primary)
-----------------

* Fetches CCs by calling FetchNewEntries() on storage node i
  where i = 0...(n-1)
* Determines the order of the new entries in the CT log.
* Sends the entries to the secondary merge nodes.
* Calculates the tree head and asks a signing node to sign it.
* When a majority of the secondaries have acknowledged the entries,
  compares the calculated tree head to the tree heads of the secondaries.
  If they match, considers the additions to the CT log final and
  begins the distributing phase.

Merging (secondaries)
---------------------

* Receives entries from the primary merge node. The node must be certain
  that the request comes from the current primary, and not
  an old one.
* Takes reasonable measures to ensure that data is in
  permanent storage.
* Calculates the new tree head and returns it to the primary merge node.

Distributing
------------

* Performs the following steps for all front-end nodes:
 * Fetches curpos by calling CurrentPosition().
 * Calls SendLog() with the hashes of CCs from curpos to newpos.
 * Fetches missing_entries by calling MissingEntries(), a list
   of hashes for the CCs that the front-end nodes does not
   have.
 * For each hash in missing_entries, upload the CC by calling
   SendEntry(CC).
 * Send the STH with the SendSTH(STH) call.


Merge-repair node
=================

* There is only one of these nodes.
* When this node detects that an STH has not been published
  in t seconds, it begins the automatic repair process.

Automatic repair process
------------------------

* Turn off all reachable merge nodes.
* If a majority of the merge nodes cannot be reached,
  die and report.
* Fetch the CT log order from the merge nodes.
* Determine the latest version of the log.
* Select a new primary merge node.
* Change the configuration of the merge nodes so that
  they know who the new primary merge node is.
* Start all merge nodes.
* If any of these steps fail, die and report.
* If all steps succeed, die and report anyway. The automatic
  repair process must not be restarted without manual
  intervention.