summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/design.txt235
1 files changed, 157 insertions, 78 deletions
diff --git a/doc/design.txt b/doc/design.txt
index 29ca0a4..0a57e70 100644
--- a/doc/design.txt
+++ b/doc/design.txt
@@ -1,89 +1,168 @@
-catlfish design (in Emacs -*- org -*- mode)
+-*- markdown -*-
+
+Overview
+========
This document describes the design of catlfish, an implementation of a
Certificate Transparency (RFC6962) log server.
-We have
-- a primary database storing x509 certificate chains [replicating r/o
- copies to a number of frontend nodes?]
-- a hash tree kept in RAM
-- one secondary database per frontend node, storing the most recently
- submitted data
-- a cluster of backend nodes with an elected leader which periodically
- updates the primary db with data from the secondary db's
-- a number of frontend nodes accepting http requests, updating
- secondary db's and reading from local r/o copy of the primary db
-- a private key used for signing SCT's and STH's, kept (in HSM:s) on
- backend nodes
-
-Backend nodes
-- are either asleep, functioning as storage only
-or
-- store submitted cert chains in persistent media
-- have write access to the primary database holding cert chains
-- periodically append new cert chains to the hash tree and sign the
- tree head
-
-Frontend nodes
-- reply to the http requests specified in RFC 6962
-- write submitted cert chains to their own, single master, secondary
- database
-- have read access to (a local copy of) the primary database
-- defer signing of SCT's (and STH's) to backend nodes
-
-The primary database
-- stores cert chains and their corresponding SCT's
-- is indexed on a running integer (primary) and a hash of the cert
- chain (secondary)
-- runs on backend nodes
-- is persistently stored on disk on several other backend nodes in
- separate data centers
-- grows with 5 GB per year, based on 5,000 3 kB submissions per day
-- max size is 300 GB, based on 100e6 certificates
-
-The secondary databases
-- store cert chains, unordered, between hash tree generation
-- run on frontend nodes
-- are persistently stored on disk on several other frontend nodes
-- are typically kept in RAM too
-- max size is around 128 MB, based on 10 submissions (รก 3 kB) per
- second for an hour
-
-Scaling, performance, estimates
-- submissions: less than 0.1 qps, based on 5,000 submissions per day
-- monitors: 6 qps, based on 100 monitors
-- auditors: 8,000 qps, based on 2.5e9 browsers visiting 100 sites
+
+
+ +------------------------------------------------+
+ | front end nodes |
+ +------------------------------------------------+
+ ^ | |
+ | v v
+ | +---------------+ +---------------+
+ | | storage nodes | | signing nodes |
+ | +---------------+ +---------------+
+ | ^ ^
+ | | |
+ +------------------------------------------------+
+ | merge master |
+ +------------------------------------------------+
+ ^ |
+ | v
+ | +----------------------------------+
+ | | merge slaves |
+ | +----------------------------------+
+ | ^
+ | |
+ +-------------------+
+ | merge-repair node |
+ +-------------------+
+
+
+
+Design assumptions
+------------------
+* The database grows with 5 GB per year, based on 5,000 3 kB
+ submissions per day
+* Max size is 300 GB, based on 100e6 certificates
+* submissions: less than 0.1 qps, based on 5,000 submissions per day
+* monitors: 6 qps, based on 100 monitors
+* auditors: 8,000 qps, based on 2.5e9 browsers visiting 100 sites
(with a 1y certificate) per month (assuming a single combined
request for doing get-sth + get-sth-consistency + get-proof-by-hash)
+
Open questions
-- What's a good MMD? Google seems to sign a new tree after 60-90
+--------------
+* What's a good MMD? Google seems to sign a new tree after 60-90
minutes (early 2014). They don't promise an MMD but aim to sign at
least once a day.
-A picture
-
-+-----------------------------------------------+
-| front end nodes |
-+-----------------------------------------------+
- ^ ^ ^ ^
- | | | |
- | v | |
- | short term long term |
- | cert db cert db copy |
- | ^ |
- | | v
-+-----------------------------------------------+
-| tree makers | mergers | signers |
-+-----------------------------------------------+
- ^ ^
- \ |
- \ v
- ------------- long term
- cert db
-
-[TODO: Update terms in text or picture so they match:
-secondary database == short term cert db
-primary database == long term cert db
-backend nodes == box with tree makers, mergers and signers]
-[TODO: Move the picture to the top of the document.]
+
+Terminology
+===========
+
+CC = Certificate Chain
+CT log = Certificate Transparency log
+
+Front-end node
+==============
+
+* Handles all http requests.
+* Has a complete copy of the published data locally.
+* Read requests are answered directly by reading local files
+ and calculating the answers.
+* Add requests are validated and then sent to all storage
+ nodes. At the same time, a signing request is sent to one or
+ more of the signing nodes. When responses have been received
+ from a predetermined number of storage nodes and one signing
+ response has been received, a response is sent to the client.
+* Has an inward-facing API with the entry points SendLog(Hashes),
+ MissingEntries() (returns a list of hashes), SendEntry(Entry),
+ SendSTH(STH), CurrentPosition().
+
+
+Storage node
+============
+
+* Stores certificate chains and SCTs.
+* Has a write API SendEntry(Entry) that stores the certificate chain
+ in a database, indexed by its hash. Then stores the hash in a list
+ NewEntries.
+* Takes reasonable measures to ensure that data is in permanent
+ storage before sending a response.
+* When seeing a new STH, moves the variable start to the index of the
+ first unpublished hash.
+* Has a read API FetchNewEntries() which returns
+ NewEntries[start...length(NewEntries)-1].
+
+
+Signing node
+============
+
+* Has the signing key for the log.
+
+
+Merging node
+============
+
+* The master is determined by configuration.
+* The other merging nodes are called "slaves".
+* The master has two phases, merging and distributing.
+
+Merging (master)
+----------------
+
+* Fetches CCs by calling FetchNewEntries() on storage node i
+ where i = 0...(n-1)
+* Determines the order of the new entries in the CT log.
+* Sends the entries to the slaves.
+* Calculates the tree head and asks a signing node to sign it.
+* When a majority of the slaves have acknowledged the entries,
+ compares the calculated tree head to the tree heads of the slaves.
+ If they match, considers the additions to the CT log final and
+ begins the distributing phase.
+
+Merging (slave)
+---------------
+
+* Receives entries from the master. The node must be certain
+ that the request comes from the current master, and not
+ an old one.
+* Takes reasonable measures to ensure that data is in
+ permanent storage.
+* Calculates the new tree head and returns it to the master.
+
+Distributing
+------------
+
+* Performs the following steps for all front-end nodes:
+ * Fetches curpos by calling CurrentPosition().
+ * Calls SendLog() with the hashes of CCs from curpos to newpos.
+ * Fetches missing_entries by calling MissingEntries(), a list
+ of hashes for the CCs that the front-end nodes does not
+ have.
+ * For each hash in missing_entries, upload the CC by calling
+ SendEntry(CC).
+ * Send the STH with the SendSTH(STH) call.
+
+
+Merge-repair node
+=================
+
+* There is only one of these nodes.
+* When this node detects that an STH has not been published
+ in t seconds, it begins the automatic repair process.
+
+Automatic repair process
+------------------------
+
+* Turn off all reachable merge nodes.
+* If a majority of the merge nodes cannot be reached,
+ die and report.
+* Fetch the CT log order from the merge nodes.
+* Determine the latest version of the log.
+* Select a new master.
+* Change the configuration of the merge nodes so that
+ they know who the new master is.
+* Start all merge nodes.
+* If any of these steps fail, die and report.
+* If all steps succeed, die and report anyway. The automatic
+ repair process must not be restarted without manual
+ intervention.
+
+