| .TH DBZ 3Z "3 Feb 1991" |
| .BY "C News" |
| .SH NAME |
| dbminit, fetch, store, dbmclose \- somewhat dbm-compatible database routines |
| .br |
| dbzfresh, dbzagain, dbzfetch, dbzstore \- database routines |
| .br |
| dbzsync, dbzsize, dbzincore, dbzcancel, dbzdebug \- database routines |
| .SH SYNOPSIS |
| .nf |
| .B #include <dbz.h> |
| .PP |
| .B dbminit(base) |
| .B char *base; |
| .PP |
| .B datum |
| .B fetch(key) |
| .B datum key; |
| .PP |
| .B store(key, value) |
| .B datum key; |
| .B datum value; |
| .PP |
| .B dbmclose() |
| .PP |
| .B dbzfresh(base, size, fieldsep, cmap, tagmask) |
| .B char *base; |
| .B long size; |
| .B int fieldsep; |
| .B int cmap; |
| .B long tagmask; |
| .PP |
| .B dbzagain(base, oldbase) |
| .B char *base; |
| .B char *oldbase; |
| .PP |
| .B datum |
| .B dbzfetch(key) |
| .B datum key; |
| .PP |
| .B dbzstore(key, value) |
| .B datum key; |
| .B datum value; |
| .PP |
| .B dbzsync() |
| .PP |
| .B long |
| .B dbzsize(nentries) |
| .B long nentries; |
| .PP |
| .B dbzincore(newvalue) |
| .PP |
| .B dbzcancel() |
| .PP |
| .B dbzdebug(newvalue) |
| .SH DESCRIPTION |
| These functions provide an indexing system for rapid random access to a |
| text file (the |
| .I base |
| .IR file ). |
| Subject to certain constraints, they are call-compatible with |
| .IR dbm (3), |
| although they also provide some extensions. |
| (Note that they are |
| .I not |
| file-compatible with |
| .I dbm |
| or any variant thereof.) |
| .PP |
| In principle, |
| .I dbz |
| stores key-value pairs, where both key and value are arbitrary sequences |
| of bytes, specified to the functions by |
| values of type |
| .IR datum , |
| typedefed in the header file to be a structure with members |
| .I dptr |
| (a value of type |
| .I char * |
| pointing to the bytes) |
| and |
| .I dsize |
| (a value of type |
| .I int |
| indicating how long the byte sequence is). |
| .PP |
| In practice, |
| .I dbz |
| is more restricted than |
| .IR dbm . |
| A |
| .I dbz |
| database |
| must be an index into a base file, |
| with the database |
| .IR value s |
| being |
| .IR fseek (3) |
| offsets into the base file. |
| Each such |
| .I value |
| must ``point to'' a place in the base file where the corresponding |
| .I key |
| sequence is found. |
| A key can be no longer than |
| .SM DBZMAXKEY |
| (a constant defined in the header file) bytes. |
| No key can be an initial subsequence of another, |
| which in most applications requires that keys be |
| either bracketed or terminated in some way (see the |
| discussion of the |
| .I fieldsep |
| parameter of |
| .IR dbzfresh , |
| below, |
| for a fine point on terminators). |
| .PP |
| .I Dbminit |
| opens a database, |
| an index into the base file |
| .IR base , |
| consisting of files |
| .IB base .dir |
| and |
| .IB base .pag |
| which must already exist. |
| (If the database is new, they should be zero-length files.) |
| Subsequent accesses go to that database until |
| .I dbmclose |
| is called to close the database. |
| The base file need not exist at the time of the |
| .IR dbminit , |
| but it must exist before accesses are attempted. |
| .PP |
| .I Fetch |
| searches the database for the specified |
| .IR key , |
| returning the corresponding |
| .IR value |
| if any. |
| .I Store |
| stores the |
| .IR key - value |
| pair in the database. |
| .I Store |
| will fail unless the database files are writeable. |
| See below for a complication arising from case mapping. |
| .PP |
| .I Dbzfresh |
| is a variant of |
| .I dbminit |
| for creating a new database with more control over details. |
| Unlike for |
| .IR dbminit , |
| the database files need not exist: |
| they will be created if necessary, |
| and truncated in any case. |
| .PP |
| .IR Dbzfresh 's |
| .I size |
| parameter specifies the size of the first hash table within the database, |
| in key-value pairs. |
| Performance will be best if |
| .I size |
| is a prime number and |
| the number of key-value pairs stored in the database does not exceed |
| about 2/3 of |
| .IR size . |
| (The |
| .I dbzsize |
| function, given the expected number of key-value pairs, |
| will suggest a database size that meets these criteria.) |
| Assuming that an |
| .I fseek |
| offset is 4 bytes, |
| the |
| .B .pag |
| file will be |
| .RI 4* size |
| bytes |
| (the |
| .B .dir |
| file is tiny and roughly constant in size) |
| until |
| the number of key-value pairs exceeds about 80% of |
| .IR size . |
| (Nothing awful will happen if the database grows beyond 100% of |
| .IR size , |
| but accesses will slow down somewhat and the |
| .B .pag |
| file will grow somewhat.) |
| .PP |
| .IR Dbzfresh 's |
| .I fieldsep |
| parameter specifies the field separator in the base file. |
| If this is not |
| NUL (0), and the last character of a |
| .I key |
| argument is NUL, that NUL compares equal to either a NUL or a |
| .I fieldsep |
| in the base file. |
| This permits use of NUL to terminate key strings without requiring that |
| NULs appear in the base file. |
| The |
| .I fieldsep |
| of a database created with |
| .I dbminit |
| is the horizontal-tab character. |
| .PP |
| For use in news systems, various forms of case mapping (e.g. uppercase to |
| lowercase) in keys are available. |
| The |
| .I cmap |
| parameter to |
| .I dbzfresh |
| is a single character specifying which of several mapping algorithms to use. |
| Available algorithms are: |
| .RS |
| .TP |
| .B 0 |
| case-sensitive: no case mapping |
| .TP |
| .B B |
| same as |
| .B 0 |
| .TP |
| .B NUL |
| same as |
| .B 0 |
| .TP |
| .B = |
| case-insensitive: uppercase and lowercase equivalent |
| .TP |
| .B b |
| same as |
| .B = |
| .TP |
| .B C |
| RFC822 message-ID rules, case-sensitive before `@' (with certain exceptions) |
| and case-insensitive after |
| .TP |
| .B ? |
| whatever the local default is, normally |
| .B C |
| .RE |
| .PP |
| Mapping algorithm |
| .B 0 |
| (no mapping) is faster than the others and is overwhelmingly the correct |
| choice for most applications. |
| Unless compatibility constraints interfere, it is more efficient to pre-map |
| the keys, storing mapped keys in the base file, than to have |
| .I dbz |
| do the mapping on every search. |
| .PP |
| For historical reasons, |
| .I fetch |
| and |
| .I store |
| expect their |
| .I key |
| arguments to be pre-mapped, but expect unmapped keys in the base file. |
| .I Dbzfetch |
| and |
| .I dbzstore |
| do the same jobs but handle all case mapping internally, |
| so the customer need not worry about it. |
| .PP |
| .I Dbz |
| stores only the database |
| .IR value s |
| in its files, relying on reference to the base file to confirm a hit on a key. |
| References to the base file can be minimized, greatly speeding up searches, |
| if a little bit of information about the keys can be stored in the |
| .I dbz |
| files. |
| This is ``free'' if there are some unused bits in an |
| .I fseek |
| offset, |
| so that the offset can be |
| .I tagged |
| with some information about the key. |
| The |
| .I tagmask |
| parameter of |
| .I dbzfresh |
| allows specifying the location of unused bits. |
| .I Tagmask |
| should be a mask with |
| one group of |
| contiguous |
| .B 1 |
| bits. |
| The bits in the mask should |
| be unused (0) in |
| .I most |
| offsets. |
| The bit immediately above the mask (the |
| .I flag |
| bit) should be unused (0) in |
| .I all |
| offsets; |
| .I (dbz)store |
| will reject attempts to store a key-value pair in which the |
| .I value |
| has the flag bit on. |
| Apart from this restriction, tagging is invisible to the user. |
| As a special case, a |
| .I tagmask |
| of 1 means ``no tagging'', for use with enormous base files or |
| on systems with unusual offset representations. |
| .PP |
| A |
| .I size |
| of 0 |
| given to |
| .I dbzfresh |
| is synonymous with the local default; |
| the normal default is suitable for tables of 90-100,000 |
| key-value pairs. |
| A |
| .I cmap |
| of 0 (NUL) is synonymous with the character |
| .BR 0 , |
| signifying no case mapping |
| (note that the character |
| .B ? |
| specifies the local default mapping, |
| normally |
| .BR C ). |
| A |
| .I tagmask |
| of 0 is synonymous with the local default tag mask, |
| normally 0x7f000000 (specifying the top bit in a 32-bit offset |
| as the flag bit, and the next 7 bits as the mask, |
| which is suitable for base files up to circa 24MB). |
| Calling |
| .I dbminit(name) |
| with the database files empty is equivalent to calling |
| .IR dbzfresh(name,0,'\et','?',0) . |
| .PP |
| When databases are regenerated periodically, as in news, |
| it is simplest to pick the parameters for a new database based on the old one. |
| This also permits some memory of past sizes of the old database, so that |
| a new database size can be chosen to cover expected fluctuations. |
| .I Dbzagain |
| is a variant of |
| .I dbminit |
| for creating a new database as a new generation of an old database. |
| The database files for |
| .I oldbase |
| must exist. |
| .I Dbzagain |
| is equivalent to calling |
| .I dbzfresh |
| with the same field separator, case mapping, and tag mask as the old database, |
| and a |
| .I size |
| equal to the result of applying |
| .I dbzsize |
| to the largest number of entries in the |
| .I oldbase |
| database and its previous 10 generations. |
| .PP |
| When many accesses are being done by the same program, |
| .I dbz |
| is massively faster if its first hash table is in memory. |
| If an internal flag is 1, |
| an attempt is made to read the table in when |
| the database is opened, and |
| .I dbmclose |
| writes it out to disk again (if it was read successfully and |
| has been modified). |
| .I Dbzincore |
| sets the flag to |
| .I newvalue |
| (which should be 0 or 1) |
| and returns the previous value; |
| this does not affect the status of a database that has already been opened. |
| The default is 0. |
| The attempt to read the table in may fail due to memory shortage; |
| in this case |
| .I dbz |
| quietly falls back on its default behavior. |
| .IR Store s |
| to an in-memory database are not (in general) written out to the file |
| until |
| .IR dbmclose |
| or |
| .IR dbzsync , |
| so if robustness in the presence of crashes |
| or concurrent accesses |
| is crucial, in-memory databases |
| should probably be avoided. |
| .PP |
| .I Dbzsync |
| causes all buffers etc. to be flushed out to the files. |
| It is typically used as a precaution against crashes or concurrent accesses |
| when a |
| .IR dbz -using |
| process will be running for a long time. |
| It is a somewhat expensive operation, |
| especially |
| for an in-memory database. |
| .PP |
| .I Dbzcancel |
| cancels any pending writes from buffers. |
| This is typically useful only for in-core databases, since writes are |
| otherwise done immediately. |
| Its main purpose is to let a child process, in the wake of a |
| .IR fork , |
| do a |
| .I dbmclose |
| without writing its parent's data to disk. |
| .PP |
| If |
| .I dbz |
| has been compiled with debugging facilities available (which makes it |
| bigger and a bit slower), |
| .I dbzdebug |
| alters the value (and returns the previous value) of an internal flag |
| which (when 1; default is 0) causes |
| verbose and cryptic debugging output on standard output. |
| .PP |
| Concurrent reading of databases is fairly safe, |
| but there is no (inter)locking, |
| so concurrent updating is not. |
| .PP |
| The database files include a record of the byte order of the processor |
| creating the database, and accesses by processors with different byte |
| order will work, although they will be slightly slower. |
| Byte order is preserved by |
| .IR dbzagain . |
| However, |
| agreement on the size and internal structure of an |
| .I fseek |
| offset is necessary, as is consensus on |
| the character set. |
| .PP |
| An open database occupies three |
| .I stdio |
| streams and their corresponding file descriptors; |
| a fourth is needed for an in-memory database. |
| Memory consumption is negligible (except for |
| .I stdio |
| buffers) except for in-memory databases. |
| .SH SEE ALSO |
| dbz(1), dbm(3) |
| .SH DIAGNOSTICS |
| Functions returning |
| .I int |
| values return 0 for success, \-1 for failure. |
| Functions returning |
| .I datum |
| values return a value with |
| .I dptr |
| set to NULL for failure. |
| .I Dbminit |
| attempts to have |
| .I errno |
| set plausibly on return, but otherwise this is not guaranteed. |
| An |
| .I errno |
| of |
| .B EDOM |
| from |
| .I dbminit |
| indicates that the database did not appear to be in |
| .I dbz |
| format. |
| .SH HISTORY |
| The original |
| .I dbz |
| was written by |
| Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us). |
| Later contributions by David Butler and Mark Moraes. |
| Extensive reworking, |
| including this documentation, |
| by Henry Spencer (henry@zoo.toronto.edu) as |
| part of the C News project. |
| Hashing function by Peter Honeyman. |
| .SH BUGS |
| The |
| .I dptr |
| members of returned |
| .I datum |
| values point to static storage which is overwritten by later calls. |
| .PP |
| Unlike |
| .IR dbm , |
| .I dbz |
| will misbehave if an existing key-value pair is `overwritten' by |
| a new |
| .I (dbz)store |
| with the same key. |
| The user is responsible for avoiding this by using |
| .I (dbz)fetch |
| first to check for duplicates; |
| an internal optimization remembers the result of the |
| first search so there is minimal overhead in this. |
| .PP |
| Waiting until after |
| .I dbminit |
| to bring the base file into existence |
| will fail if |
| .IR chdir (2) |
| has been used meanwhile. |
| .PP |
| The RFC822 case mapper implements only a first approximation to the |
| hideously-complex RFC822 case rules. |
| .PP |
| The prime finder in |
| .I dbzsize |
| is not particularly quick. |
| .PP |
| Should implement the |
| .I dbm |
| functions |
| .IR delete , |
| .IR firstkey , |
| and |
| .IR nextkey . |
| .PP |
| On C implementations which trap integer overflow, |
| .I dbz |
| will refuse to |
| .I (dbz)store |
| an |
| .I fseek |
| offset equal to the greatest |
| representable |
| positive number, |
| as this would cause overflow in the biased representation used. |
| .PP |
| .I Dbzagain |
| perhaps ought to notice when many offsets |
| in the old database were |
| too big for |
| tagging, and shrink the tag mask to match. |
| .PP |
| Marking |
| .IR dbz 's |
| file descriptors |
| .RI close-on- exec |
| would be a better approach to the problem |
| .I dbzcancel |
| tries to address, but that's harder to do portably. |