summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLinus Nordberg <linus@nordberg.se>2014-05-16 10:37:21 +0200
committerLinus Nordberg <linus@nordberg.se>2014-05-16 10:37:21 +0200
commitebef3cdcc323f46c6e04e86222b00414a020173e (patch)
tree29b59691efdb8362a202a01ed1067bfd81976db6
parent74fa8bfa586da34eb2c9be14d39d10f5cac1955b (diff)
Update the design doc with ideas about a split database.
-rw-r--r--doc/design.txt72
1 files changed, 39 insertions, 33 deletions
diff --git a/doc/design.txt b/doc/design.txt
index a83ec85..9bd77aa 100644
--- a/doc/design.txt
+++ b/doc/design.txt
@@ -4,42 +4,47 @@ This document describes the design of ctls, an implementation of a
Certificate Transparency (RFC6962) log server.
We have
-- "a db" storing
- i) x509 certificate chains and
- ii) the hash tree,
- replicating r/o copies to n secondary nodes
--? 1 primary node updating the db
--? n secondary nodes reading from local r/o db
-
-Nodes reply to the https requests specified in RFC 6962.
-?Nodes can operate in one of two modes -- primary or secondary.
-[TODO: A secondary node can become primary. When, how?]
-
-Node roles
-- depot
-- tree-maker
-- tree-signer
-- submission-point
-- query-replyer
-
-?Primary nodes
+- a primary database storing x509 certificate chains [replicating r/o
+ copies to a number of frontend nodes?]
+- a hash tree kept in RAM
+- one secondary database per frontend node, storing the most recently
+ submitted data
+- a cluster of backend nodes with an elected leader which periodically
+ updates the primary db with data from the secondary db's
+- a number of frontend nodes accepting http requests, updating
+ secondary db's and reading from [local r/o copy of?] the primary db
+
+Backend nodes
+- are either asleep, functioning as storage only
+or
- store submitted cert chains in persistent media
-- have write access to the database holding cert chains and the hash tree
-- periodically add cert chains to the hash tree and sign the tree head
- (like ever 10 minutes and at least every hour?)
-
-?Secondary nodes
-- have read access to the database [which is pushed or pulled?]
-
-The log data db
-- is persistently stored on [more than one] disk [files, DETS, mnesia,
- some other database?]
+- have write access to the primary database holding cert chains
+- periodically append new cert chains to the hash tree and sign the
+ tree head
+
+Frontend nodes
+- reply to the http requests specified in RFC 6962
+- write submitted cert chains to their own, single master, secondary
+ database
+- have read access to [a local copy of?] the primary database
+
+The primary database
+- stores cert chains and their corresponding SCT's
+- is indexed on a running integer (primary) and a hash of the cert
+ chain (secondary)
+- runs on backend nodes
+- is persistently stored on disk on several other backend nodes in
+ separate data centers
- grows with 5 GB per year, based on 5,000 3 kB submissions per day
- max size is 300 GB, based on 100e6 certificates
-The hash tree db
--? is persistantly stored on disk
--? is implemented as a 'protected, ram_file' DETS table
+The secondary databases
+- store cert chains, unordered, between hash tree generation
+- run on frontend nodes
+- are persistently stored on disk on several other frontend nodes
+- are typically kept in RAM too
+- max size is around 128 MB, based on 10 (3 kB) submissions per second
+ for an hour
Scaling, performance, estimates
- submissions: less than 0.1 qps, based on 5,000 submissions per day
@@ -50,4 +55,5 @@ Scaling, performance, estimates
Open questions
- What's a good MMD? Google seem to use an MMD of well over 1h at the
- moment (early 2014).
+ moment (early 2014). We could start at 4h which would give us some
+ time to sort things out in case of trouble.