summaryrefslogtreecommitdiff
path: root/doc/merge.txt
diff options
context:
space:
mode:
authorLinus Nordberg <linus@nordu.net>2016-07-11 15:00:29 +0200
committerLinus Nordberg <linus@nordu.net>2016-07-11 15:00:29 +0200
commit3d4a9fdd338713c2f63da2b92940904762878d98 (patch)
tree2e8ee7375619d507f0f206be2c713aa12d17f048 /doc/merge.txt
parent1a36628401658def9ab9595f7cbcf72b8cb4eb6a (diff)
parentbbf254d6d7f1708503f425c0eb8926af1b715b9c (diff)
Merge remote-tracking branch 'refs/remotes/map/python-requests-chunked'
Diffstat (limited to 'doc/merge.txt')
-rw-r--r--doc/merge.txt60
1 files changed, 60 insertions, 0 deletions
diff --git a/doc/merge.txt b/doc/merge.txt
index 28757a7..b2e2738 100644
--- a/doc/merge.txt
+++ b/doc/merge.txt
@@ -20,6 +20,66 @@ The merge process
- merge-dist distributes 'sth' and missing entries to frontend nodes.
+Merge distribution (merge_dist)
+-----------------------------------------------------
+
+ * get current position from frontend server (curpos)
+
+ * send log
+ * sends log in chunks of 1000 hashes from curpos
+
+ * get missing entries
+ * server goes through all hashes from curpos and checks if they are
+ present
+ * when the server has collected 100000 non-present entries, it
+ returns them
+ * server also keep a separate (in-memory) counter that caches the
+ index of the first entry that either hasn't been checked if it is
+ present or not, or that is checked and found to be non-present,
+ to allow the server to start from that position
+
+ * send entries
+ * send these entries one at a time
+ * does not get more missing entries when it is done
+
+ * send sth
+ * sends the previously (merge-sth) constructed sth to the server,
+ which verifies all entries and adds entry-to-hash and
+ hash-to-index
+ * saves the last verified position continuously to avoid doing the
+ work again if the verification is aborted and restarted
+
+Merge backup (merge_backup)
+-----------------------------------------------------
+
+ * get verifiedsize from backup server
+
+ * send log:
+ * determines the end of the log by trying to send small chunks of
+ the log hashes from verifiedsize until it fails, then restarts
+ with the normal chunk size (1000)
+
+ * get missing entries
+ * this stage is the same as for merge_dist
+
+ * send entries
+ * send these entries in chunks of 100 at a time (this is limited
+ because of memory considerations and web server limits)
+ * when it is done, goes back to the "get missing entries" stage,
+ until there are no more missing entries
+
+ * verifyroot
+ * server verifies all entries from verifiedsize, and then
+ calculates and returns root hash
+ * unlike merge distribution, does not save the last verified
+ position either continuously or when it is finished, which means
+ that it then has to verify all entries again if it is aborted and
+ restarted before verifiedsize is set to the new value
+
+ * if merge_backup sees that the root hash is correct, it sets
+ verifiedsize on backup server
+
+
TODO
====