The Iūdex BARC container format supports efficient block storage of raw downloaded or post-processed content. The format was inspired by Heritrix ARC and WARC, but offers several unique features/advantages:
A Human/machine readable header used for first record (offset zero) and each subsequent record:
Magic rlength tt meta rqst resp
BARC1 FFFFFFFF HC FFFF FFFF FFFF(CRLF)
CRLF
All lengths are in hexadecimal bytes, zero-padded and fixed width. The header itself is 36 bytes long
Component | Description |
---|---|
rlength | Length of record in hexadecimal bytes (not including header). |
tt | Type and compression bytes (see below). |
meta | Length of meta header block in hexadecimal bytes |
rqst | Length of request header block in hexadecimal bytes |
resp | Length of response header block in hexadecimal bytes |
The following record types are currently defined:
Type | Description |
---|---|
R | Replaced (or never completed) record. Most consumers will want to ignore these records. |
H | HTML or other raw (web downloaded) content. For example, downloaded images would be suitable included as H values. |
D | A Delete record containing meta headers, but no body. |
The following compression modes are currently defined:
Mode | Description |
---|---|
C | Gzip compressed |
P | Plain (uncompressed) |
A. Show Record(s)
iudex-barc show barc/000000.barc
iudex-barc show -mqr -o 0xde411 barc/000015.barc
(stored in db as integer, so use base-10 offset)
B. HTTP fetch
iudex-http-record [-z] [-t] url outfile
C. Copy/Concatenate record to file
iudex-barc [-zt] copy infile outfile
(last file implicit write)
Format:
-BARC1 H
=META=
meta_header_1: value
meta_header_2: value
=RQST=
Request-Line: GET /foo/bar.html
=RESP=
Status-Line: HTTP/1.1 200 OK
=BODY=
<html>
...
Sample DELETE:
-BARC1 D
=META=
url: http://foobar/framis
uhash: 98ASDz798a782hasfd79jbz
From Wikipedia: BARC-LARC-XV-2.jpeg (public domain) ↩