1- 'Ancient' module for OCaml
2- ----------------------------------------------------------------------
1+ # Ancient library for OCaml
32
4- What does this module do?
5- ----------------------------------------------------------------------
3+ ## What does this module do?
64
75This module allows you to use in-memory data structures which are
86larger than available memory and so are kept in swap. If you try this
@@ -36,8 +34,8 @@ the ancient heap may need to be manually deallocated. The ancient
3634heap may either exist as ordinary memory, or may be backed by a file,
3735which is how shared structures are possible.
3836
39- Structures which are moved into ancient must be treated as STRICTLY
40- NON-MUTABLE. If an ancient structure is changed in any way then it
37+ Structures which are moved into ancient must be treated as ** STRICTLY
38+ NON-MUTABLE** . If an ancient structure is changed in any way then it
4139may cause a crash.
4240
4341There are some limitations which apply to ancient data structures.
@@ -47,57 +45,56 @@ This module is most useful on 64 bit architectures where large address
4745spaces are the norm. We have successfully used it with a 38 GB
4846address space backed by a file and shared between processes.
4947
50- API
51- ----------------------------------------------------------------------
48+ ## API
5249
53- Please see file ancient.mli .
50+ Please see file ` ancient.mli ` .
5451
55- Compiling
56- ----------------------------------------------------------------------
52+ ## Compiling
5753
58- make
54+ ``` console
55+ make
56+ ```
5957
6058Make sure you run this command before running any program which
6159uses the Ancient module:
6260
63- ulimit -s unlimited
64-
65- Example
66- ----------------------------------------------------------------------
61+ ``` console
62+ ulimit -s unlimited
63+ ```
6764
68- XXX Note the example code is really stupid, and fails for large
69- dictionaries. See bug (10) below.
65+ ## Example
7066
71- Run:
72-
73- ulimit -s unlimited
74- wordsfile=/usr/share/dict/words
75- baseaddr=0x440000000000 # System specific - see below.
76- ./test_ancient_dict_write.opt $wordsfile dictionary.data $baseaddr
77- ./test_ancient_dict_verify.opt $wordsfile dictionary.data
78- ./test_ancient_dict_read.opt dictionary.data
67+ Note the example code is really stupid, and fails for large
68+ dictionaries. See bug (10) below. Run:
69+ ``` console
70+ ulimit -s unlimited
71+ wordsfile=/usr/share/dict/words
72+ baseaddr=0x440000000000 # System specific - see below.
73+ ./test_ancient_dict_write.opt $wordsfile dictionary.data $baseaddr
74+ ./test_ancient_dict_verify.opt $wordsfile dictionary.data
75+ ./test_ancient_dict_read.opt dictionary.data
76+ ```
7977
8078(You can run several instances of test_ancient_dict_read.opt on the
8179same machine to demonstrate sharing).
8280
83- Shortcomings & bugs
84- ----------------------------------------------------------------------
81+ ## Shortcomings & bugs
8582
86- (0) Stack overflows are common when marking/sharing large structures
83+ 1 . Stack overflows are common when marking/sharing large structures
8784because we use a recursive algorithm to visit the structures. If you
8885get random segfaults during marking/sharing, then try this before
8986running your program:
90-
87+ ``` console
9188 ulimit -s unlimited
89+ ```
9290
93- (1) Ad-hoc polymorphic primitives (structural equality, marshalling
91+ 2 . Ad-hoc polymorphic primitives (structural equality, marshalling
9492and hashing) do not work on ancient data structures, meaning that you
9593will need to provide your own comparison and hashing functions. For
9694more details see Xavier Leroy's response here:
97-
9895http://caml.inria.fr/pub/ml-archives/caml-list/2006/09/977818689f4ceb2178c592453df7a343.en.html
9996
100- (2) Ancient.attach suggests setting a baseaddr parameter for newly
97+ 3 . Ancient.attach suggests setting a baseaddr parameter for newly
10198created files (it has no effect when attaching existing files). We
10299strongly recommend this because in our tests we found that mmap would
103100locate the memory segment inappropriately -- the basic problem is that
@@ -110,19 +107,17 @@ programmers to guess at a good base address which will be valid in the
110107future. There are no other good solutions we have found --
111108preallocating the file is tricky with the current mmalloc code.
112109
113- (3) The current code requires you to first of all create the large
110+ 4 . The current code requires you to first of all create the large
114111data structures on the regular OCaml heap, then mark them as ancient,
115112effectively copying them out of the OCaml heap, then garbage collect
116113the (hopefully unreferenced) structures on the OCaml heap. In other
117114words, you need to have all the memory available as physical memory.
118115The way to avoid this is to mark structures as ancient incrementally
119116as they are created, or in chunks, whatever works for you.
120-
121117We typically use Ancient to deal with web server logfiles, and in this
122118case loading one file of data into memory and marking it as ancient
123119before moving on to the next file works for us.
124-
125- (4) Why do ancient structures need to be read-only / not mutated? The
120+ 5 . Why do ancient structures need to be read-only / not mutated? The
126121reason is that you might create a new OCaml heap structure and point
127122the ancient structure at this heap structure. The heap structure has
128123no apparent incoming pointers (the GC will not by its very nature
@@ -134,14 +129,12 @@ data to OCaml heap data. In theory it should be possible to modify
134129ancient data to point to other ancient data, but we have not tried
135130this.
136131
137- (5) [Limit on number of keys -- issue fixed]
138-
139- (6) [Advanced topic] The _mark function in ancient_c.c makes no
132+ 6 . ** Advanced topic:** the ` _mark ` function in ` ancient_c.c ` makes no
140133attempt to arrange the data structures in memory / on disk in a way
141134which optimises them for access. The worst example is when you have
142135an array of large structures, where only a few fields in the structure
143136will be accessed. Typically these will end up on disk as:
144-
137+ ```
145138 array of N pointers
146139 structure 1
147140 field A
@@ -166,10 +159,10 @@ will be accessed. Typically these will end up on disk as:
166159 field B
167160 ...
168161 field Z
169-
162+ ```
170163If you then iterate accessing only fields A, you end up swapping the
171164whole lot back into memory. A better arrangement would have been:
172-
165+ ```
173166 array of N pointers
174167 structure 1
175168 structure 2
@@ -184,23 +177,21 @@ whole lot back into memory. A better arrangement would have been:
184177 field B from structure 1
185178 field B from structure 2
186179 etc.
187-
188- which avoids loading unused fields at all. In some circumstances we
180+ ```
181+ which avoids loading unused fields at all. In some circumstances we
189182have shown that this could make a huge difference to performance, but
190183we are not sure how to implement this cleanly in the current library.
191-
192184[ Update: I have fixed issue 6 manually for my Weblogs example and
193185confirmed that it does make a huge difference to performance, although
194186at considerable extra code complexity. Interested people can see the
195187weblogs library, file import_weblogs_ancient.ml.in] .
196188
197- (7) [ Advanced topic] Certain techniques such as Address Space
189+ 7 . ** Advanced topic: ** certain techniques such as Address Space
198190Randomisation (http://lwn.net/Articles/121845/ ) are probably not
199191compatible with the Ancient module and shared files. Because the
200192ancient data structures contain real pointers, these pointers would be
201193invalidated if the shared file was not mapped in at precisely the same
202194base address in all processes which are sharing the file.
203-
204195One solution might be to use private mappings and a list of fixups.
205196In fact, the code actually builds a list of fixups currently while
206197marking, because it needs to deal with precisely this issue (during
@@ -210,7 +201,6 @@ to be fixed up afterwards). The list of fixups would need to be
210201stored alongside the memory segment (currently it is discarded after
211202marking), and the file would need to be mapped in using MAP_PRIVATE
212203(see below).
213-
214204A possible problem with this is that because OCaml objects tend to be
215205small and contain a lot of pointers, it is likely that fixing up the
216206pointers would result in every page in the memory segment becoming
@@ -219,34 +209,30 @@ mappings in the first place. However it is likely that some users of
219209this module have large amounts of opaque data and few pointers, and
220210for them this would be worthwhile.
221211
222- (8) Currently mmalloc is implemented so that the file is mapped in
212+ 8 . Currently mmalloc is implemented so that the file is mapped in
223213PROT_READ|PROT_WRITE and MAP_SHARED. Ancient data structures are
224214supposed to be immutable so strictly speaking write access shouldn't
225215be required. It may be worthwhile modifying mmalloc to allow
226216read-only mappings, and private mappings.
227217
228- (9) The library assumes that every OCaml object is at least one word
218+ 9 . The library assumes that every OCaml object is at least one word
229219long. This seemed like a good assumption up until I found that
230220zero-length arrays are valid zero word objects. At the moment you
231221cannot mark structures which contain zero-length arrays -- you will
232222get an assert-failure in the _ mark function.
233-
234223Possibly there are other types of OCaml structure which are zero word
235224objects and also cannot be marked. I'm not sure what these will be:
236225for example empty strings are stored as one word OCaml objects, so
237- they are OK.
238-
239- The solution to this bug is non-trivial.
226+ they are OK. The solution to this bug is non-trivial.
240227
241- (10) Example code is very stupid. It fails with large dictionaries,
228+ 11 . Example code is very stupid. It fails with large dictionaries,
242229eg. the one with nearly 500,000 words found in Fedora.
243230
244- (11) In function 'mark', the "// Ran out of memory. Recover and throw
231+ 12 . In function 'mark', the "// Ran out of memory. Recover and throw
245232an exception." codepath actually fails if you use it - segfaulting
246233inside do_restore.
247234
248- Authors
249- ----------------------------------------------------------------------
235+ ## Authors
250236
251237Primary code was written by Richard W.M. Jones <rich at annexia.org >
252238with help from Markus Mottl, Martin Jambon, and invaluable advice from
@@ -257,16 +243,9 @@ mmalloc was written by Mike Haertel and Fred Fish.
257243Port to no-naked-pointers and OCaml 5+ by Fabrice Le Fessant at
258244OCamlPro.
259245
260- License
261- ----------------------------------------------------------------------
246+ ## License
262247
263248The module is licensed under the LGPL + OCaml linking exception. This
264249module includes mmalloc which was originally distributed with gdb
265250(although it has since been removed), and that code was distributed
266251under the plain LGPL.
267-
268- Latest version
269- ----------------------------------------------------------------------
270-
271- The latest version can be found on the website:
272- http://merjis.com/developers/ancient
0 commit comments