diff options
author | Linus Nordberg <linus@nordberg.se> | 2014-05-16 10:37:21 +0200 |
---|---|---|
committer | Linus Nordberg <linus@nordberg.se> | 2014-05-16 10:37:21 +0200 |
commit | ebef3cdcc323f46c6e04e86222b00414a020173e (patch) | |
tree | 29b59691efdb8362a202a01ed1067bfd81976db6 | |
parent | 74fa8bfa586da34eb2c9be14d39d10f5cac1955b (diff) |
Update the design doc with ideas about a split database.
-rw-r--r-- | doc/design.txt | 72 |
1 files changed, 39 insertions, 33 deletions
diff --git a/doc/design.txt b/doc/design.txt index a83ec85..9bd77aa 100644 --- a/doc/design.txt +++ b/doc/design.txt @@ -4,42 +4,47 @@ This document describes the design of ctls, an implementation of a Certificate Transparency (RFC6962) log server. We have -- "a db" storing - i) x509 certificate chains and - ii) the hash tree, - replicating r/o copies to n secondary nodes --? 1 primary node updating the db --? n secondary nodes reading from local r/o db - -Nodes reply to the https requests specified in RFC 6962. -?Nodes can operate in one of two modes -- primary or secondary. -[TODO: A secondary node can become primary. When, how?] - -Node roles -- depot -- tree-maker -- tree-signer -- submission-point -- query-replyer - -?Primary nodes +- a primary database storing x509 certificate chains [replicating r/o + copies to a number of frontend nodes?] +- a hash tree kept in RAM +- one secondary database per frontend node, storing the most recently + submitted data +- a cluster of backend nodes with an elected leader which periodically + updates the primary db with data from the secondary db's +- a number of frontend nodes accepting http requests, updating + secondary db's and reading from [local r/o copy of?] the primary db + +Backend nodes +- are either asleep, functioning as storage only +or - store submitted cert chains in persistent media -- have write access to the database holding cert chains and the hash tree -- periodically add cert chains to the hash tree and sign the tree head - (like ever 10 minutes and at least every hour?) - -?Secondary nodes -- have read access to the database [which is pushed or pulled?] - -The log data db -- is persistently stored on [more than one] disk [files, DETS, mnesia, - some other database?] +- have write access to the primary database holding cert chains +- periodically append new cert chains to the hash tree and sign the + tree head + +Frontend nodes +- reply to the http requests specified in RFC 6962 +- write submitted cert chains to their own, single master, secondary + database +- have read access to [a local copy of?] the primary database + +The primary database +- stores cert chains and their corresponding SCT's +- is indexed on a running integer (primary) and a hash of the cert + chain (secondary) +- runs on backend nodes +- is persistently stored on disk on several other backend nodes in + separate data centers - grows with 5 GB per year, based on 5,000 3 kB submissions per day - max size is 300 GB, based on 100e6 certificates -The hash tree db --? is persistantly stored on disk --? is implemented as a 'protected, ram_file' DETS table +The secondary databases +- store cert chains, unordered, between hash tree generation +- run on frontend nodes +- are persistently stored on disk on several other frontend nodes +- are typically kept in RAM too +- max size is around 128 MB, based on 10 (3 kB) submissions per second + for an hour Scaling, performance, estimates - submissions: less than 0.1 qps, based on 5,000 submissions per day @@ -50,4 +55,5 @@ Scaling, performance, estimates Open questions - What's a good MMD? Google seem to use an MMD of well over 1h at the - moment (early 2014). + moment (early 2014). We could start at 4h which would give us some + time to sort things out in case of trouble. |