The Little Data Description Language (LDDL) is a simple language for describing the format of a stream of data. It won't handle every possible data format, but it does seem to handle the ones we deal with in our various experiments, at least the data stream from the hardware.
The data is considered to be composed of individual events. In the simplest data formats, each event is identical. In more complex data formats, there may be several different types of events. The LDDL language allows these various data formats to be specified.
The building blocks of any data format consist of
The user of LDDL must write (or have someone else write) a file containing a description of the data. What follows is a tutorial-like introduction to specifying the data format, with some examples.
In these formats, each data event consists of a sequence of longs, shorts, chars, floats, and/or doubles. There is only one type of event in the data stream and it always has the same size. The data spec file that describes such a format consists of a sequence of lines of the form
<type> [quantity]where <type> consists of one of the following capital letters:
Comments may be inserted into the format description. They start with the # character and continue to the end of the line.
For example, if an event of our data consists of 3 longs, 2 shorts, a 4 x 5 arrays of chars, a long, and a short, 2 doubles and a float, then it could be represented like so:
L 3 S 2 # a 4 x 5 array follows B 20 L S D 2 F
A real-life example of this type of format is the data for the multi-anode PMT calibration from a SIFTER accelerator run in 1998. Each event consisted of a short int start-of-event pattern, a short int that is always 1, a long int event number, and a long int time stamp, followed by 32 short ints of pha data. It can be described by the following specification:
S = 0xA098 # ID pattern S = 1 L # evt number L # time stamp (PC tick count) S 32 # pha valuesOn the first two lines above, we use the ``= val'' construct. This indicates that LDDL will check the value of the data item. If it is not as specified then a warning message will be printed.
example: 1 short int = 2304, but may or may not be present
S = 2304 optional
Here is an example of converting a C structure into the format required in the data specification file. The following struct
struct datum { short s[8]; long l[4]; double d[2]; char c[8]; float f[8]; } D;can be represented like so:
S 8 L 4 D 2 B 8 F 8
The library routines can determine whether the data is in the computer's native byte order if there is some sort of data item than can be used as an indicator. For example, if the first (or second or third, etc) word of the data always has a particular value, say 0x1234, then the library can check the value of this word. If it is in fact 0x1234, then the data is assumed to be in the native endianness. If it turns out to be 0x3412, then the data is in the opposite endianness.
This information is specified using the endian statement. For example:
endian offset 2 type S = 0x90afstates that if the program finds a short integer at a 2 byte offset from the beginning of the input data containing the value 0x90af, then the data is in native byte order. The endian statement may also have a bitmask:
endian offset 4 type L = 0xABCD0000 mask 0xFFFF0000In this example, the value of the long integer at byte offset 4 must be 0xABCD0000 after masking with 0xFFFF0000.
The default value of offset is 0, and the default is not to mask:
endian type S = 0x1234
A block of data type lines may be grouped together, named, and referenced by name later in the format file. For example:
group blob S 2 L group_end
A group begins with a line of the form
group foowhere ``foo'' is the name of this group. It ends with a line consisting of the keyword ``group_end''.
The above example specifies that a blob consists of two shorts, followed by a long. It can then be used like so, for example:
S = 0xffff S = 0xffbc S -> nb blob nb
The previous example expects the third short to contain the number of blob groups to expect. This value is placed in the variable nb. Then, in the last line, the value of nb is used to specify the number of blob groups to expect in the data.
If there is more than one kind of event in the data, then the data format must specify two things:
We provide this information the LDDL program by using a signature definition. This statement begins with the keyword sig. It specifies the following:
There may be more than one set of these conditions for each event kind. For example, here is a set of signatures for the SOFT MSU 1995 data:
sig real offset 2 type S = 0xfff9 && offset 4 type S = 0 sig led_cal offset 2 type S = 0xfff9 && offset 4 type S = 2 sig reply offset 2 type S = 0xfff9 && offset 4 type S = 4 sig pha_only offset 2 type S = 0xfff9 && offset 4 type S = 9 sig spill_hdr offset 2 type S = 0xffee
Notice that the spill_hdr has only one set of conditions, and the other event kinds have two (separated by the && token). In this example, for a ``reply'' event, LDDL will first check that the short int at an offset of 2 bytes is equal to 0xFFF9 and if so, then check that the short int at offset 4 bytes is 4.
Here is an example from the TIGER/ULDB data stream which uses the bitmask feature:
sig real offset 0 type S = 1 mask 0x000f sig pedcal offset 0 type S = 2 mask 0x000f sig lightcal offset 0 type S = 3 mask 0x000f sig fasthsk offset 0 type S = 4 mask 0x000f sig medhsk offset 0 type S = 5 mask 0x000f sig slowhsk offset 0 type S = 6 mask 0x000f sig vslowhsk offset 0 type S = 7 mask 0x000f sig reply offset 0 type S = 8 mask 0x000fIn this example, for a "medhsk" event, LDDL will take the short int at offset 0, and it with the value 0x000F, and check that that value is equal to 5.
Once the signature of each event type has been specified, the event itself may be defined. Continuing with an excerpt from TIGER/ULDB, we would have something like:
group header S = 0xbef0 mask 0xfff0 L 2 group_end group ender S # checksum S = 0xabcd group_end evt real header L # hazard clk S # status wd S 147 ender evt reply header S # rfu B 82 ender
As you can see, the event defintion begins with a line of the form
evt foowhere ``foo'' is the name of this event. Following this line is a series of data type lines or group lines as described above.
S = 0xABCD # event ID S # number of items S : print "%d " # event number - print itIn general, anything following the colon (:) separator will be passed to the application program. Each program is free to interpret the string in its own way. The comment at the end of the line is, of course, not seen by the application.
S & 0xfff0 >> 4Suppose the data value is 0xFACE. Using the above description, LDDL would first mask the 16-bit value with 0xfff0 (producing 0xFAC0), and then to shift the value right 4 bits (producing 0x0FAC). The new value can be used with other LDDL constructs:
L & 0xfff00000 >> 20 -> nparmsNote that by default these operations may modify the data value that will be presented to the application. An application can take measures to receive the original value or the modified value. See Returning modified or original values.
{ S & 0xfff0 >> 4 -> nwords S & 0x000f -> seq }Note for application programmers: this construct causes LDDL to make two functions calls to your node callback. See Returning duplicate nodes.
# data format for pdtest program endian type S = 0x90af group pha S 8 # phas S # S scaler S # SC scaler S # SC channel S # board id group_end S = 0x90af S -> npha # no. of pha boards L # evtno L # tick count pha npha
# SOFT MSU 1995 data format sig real offset 2 type S = 0xfff9 && offset 4 type S = 0 sig led_cal offset 2 type S = 0xfff9 && offset 4 type S = 2 sig reply offset 2 type S = 0xfff9 && offset 4 type S = 4 sig pha_only offset 2 type S = 0xfff9 && offset 4 type S = 9 sig spill_hdr offset 2 type S = 0xffee group std # "standard" stuff found at the beginning of most evts S = 0xffff S = 0xfff9 S 2 L 4 L -> vidsize2 # size (bytes) of camera 2 data L L -> vidsize1 # size (bytes) of camera 1 data group_end group blob S 2 # x, y coord of centroid L # blob intensity group_end group pha S = 0xffff S = 0xfff8 S 11 group_end group disc # discriminator stuff S = 0xffff S = 0xffe6 B 12 S 6 group_end group video1 # video data for camera 1 S = 0xffff S = 0xffec B vidsize1 # vidsize1 was extracted in the std group group_end group video2 # video data for camera 2 S = 0xffff S = 0xffed B vidsize2 # vidsize2 was extracted in the std group group_end group blobs1 # blob data for camera 1 S = 0xffff S = 0xffbc S -> nblobs1 blob nblobs1 group_end group blobs2 # blob data for camera 2 S = 0xffff S = 0xffbc S -> nblobs2 blob nblobs2 group_end group ender S = 0xffff S = 0xfff2 group_end evt spill_hdr S = 0xffff S = 0xffee L evt real std pha disc video1 video2 S = 0xffff S = 0xffbc S -> nblobs1 blob nblobs1 S = 0xffff S = 0xffbc S -> nblobs2 blob nblobs2 ender B = 0xdd optional B = 0xdd optional B = 0xdd optional B = 0xdd optional evt led_cal std pha disc video1 video2 S = 0xffff S = 0xffbc S -> nblobs1 blob nblobs1 S = 0xffff S = 0xffbc S -> nblobs2 blob nblobs2 ender B = 0xdd optional B = 0xdd optional B = 0xdd optional B = 0xdd optional evt reply std B 124 ender B = 0xdd optional B = 0xdd optional B = 0xdd optional B = 0xdd optional evt pha_only std pha ender B = 0xdd optional B = 0xdd optional B = 0xdd optional B = 0xdd optional
# TIGER ULDB 2000 data format # Just science data, not including CCSDS packet header or event time tag sig real offset 0 type S = 1 mask 0x000f sig pedcal offset 0 type S = 2 mask 0x000f sig lightcal offset 0 type S = 3 mask 0x000f sig fasthsk offset 0 type S = 4 mask 0x000f sig medhsk offset 0 type S = 5 mask 0x000f sig slowhsk offset 0 type S = 6 mask 0x000f sig vslowhsk offset 0 type S = 7 mask 0x000f sig reply offset 0 type S = 8 mask 0x000f group header S = 0xbef0 mask 0xfff0 L 2 group_end group ender S # checksum S = 0xabcd group_end evt real header L # hazard clk S # status wd S 147 ender evt pedcal header L # hazard clk S # status wd S 147 ender evt lightcal header L # hazard clk S # status wd S 147 ender evt fasthsk header S # hsk seq L 2 # live time, real time S 36 ender evt medhsk header S # hsk seq S 70 ender evt slowhsk header S # hsk seq S 38 B 12 S 11 ender evt vslowhsk header S # hsk seq S 3 B 2 S 8 B 6 S 6 ender evt reply header S # rfu B 82 ender
Dataswap takes care of converting the data from big- to little-endian (or vice-versa). The format file is just straight LDDL with no application specific commands. It takes the following command line parameters:
-f fmtfile | file containing data format spec | |
-o outfile | output file (default stdout) | |
-n | data IS NOT in native endianness (overrides endian keyword) | |
-N | data IS in native endianness (overrides endian keyword) | |
[file ...] | list of input file names (default stdin) |
Dataprint is used to print selected data items in text format. It defines one application-specific command: print.
The print command has the following parameters. Examples are given below.
[-n numspec] | range of numbers specifying which of several data items to print | |
[fmt] | printf format (see defaults below) |
Dataprint uses the following notation for specifying a range of values. A range consists of two values separated by a dash: 10-37. An individual value can also be specified: 225 and these can be combined in a comma-separated list: 10-37,225 or 1,3,5,7-10,18-21,25. A C function that implements numspecs can be found in the files parsenum.c and parsenum.h in the LDDL distribution.
The default formats
are as follows:
type | format |
---|---|
B (char) | %02x |
S (short) | %04x |
L (long) | %08x |
F (float) | %0f |
D (double) | %0f |
F : printIt is also possible to specify a particular printf-style format:
L : print "%05ld "or even something like
S : print "evtno %u "This is probably not too useful, but it is possible:
S : print "Hey there, buddy!\n\t\t=====> %d\n\n"The following case has a sequence of 5 shorts:
S 5 : print "%04x "Given this, dataprint will print each of the 5 values using the %04x format. If you only want to print, for example, the first 3 values (see numspec above), just say:
S 5 : print -n 0-2 "%04x "or
S 5 : print -n 0,1,2 "%04x "In the following:
group pha S 8 # pha values S # discriminator S # channel ID group_end S = 0xABCD pha : printwe are asking to print the group ``pha''. When dataprint encounters this, it will print all the elements in the group (including any other groups that may be used inside the first group) in the printf format specified, which in this case is the default for each data item. In this case, it will print the 8 pha values, followed by the discriminator and channel ID. If you only want to print some of these items, then you should put the print statements on the elements inside the group:
group pha S 8 : print 0-3,6 # pha values S # discriminator S : print # channel ID group_end S = 0xABCD phaThat will print phas 0 through 3 and 6 plus the channel ID. Note that a print command on a group will not override a print command inside the group:
group pha S 8 # pha values S # discriminator S : print "%u " # channel ID group_end S = 0xABCD pha : print "%05u "In this case, the channel ID will use the format %u and all the other pha elements will be printed with the %05u format specified on the last line.
The Little Data Description Language is implemented as a library of C routines. This section describes the programming interface to the library. We will refer to the library as LDDL. As an aid to understanding this information, you might want to refer to the source code of the dataprint program in the file dataprint.c in the LDDL source distribution.
It is important to understand that LDDL models the data as a stream of events which are composed of nodes, which are in turn made up of either data items or groups of data items.
The main work of writing an application is in coding the callback functions. They must be ``registered'' with LDDL (see Callback functions).
Before LDDL begins its operations, the application can set a couple of parameters (see Set-up routines).
Finally, control is passed to LDDL using lddl_start()).
If necessary, an application can determine with which version of LDDL it was linked by using lddl_version().
Consider once again the following data description:
Example data format
1. group pha 2. S 8 # node consisting of 8 data items 3. S # node of 1 data item 4. S # node of 1 data item 5. group_end 6. sig real type S = 0xA000 7. sig pha type S = 0xA001 8. evt real 9. S = 0xA000 10. S 5 11. evt calibration 12. S = 0xA001 13. S -> npha 14. pha npha # node consisting of a group
LDDL allows the application programmer to provide callback functions that do the real work of the program. Callbacks can be executed (called back?) on the following conditions:
Pre-event callback
So, in our example,
callback #1 (before identifying the next event) would
apply at the point in the data stream when we expect to start a new event,
but we haven't yet identified whether it is a ``real'' or ``calibration''
event.
To use this callback, your function must be of the form
int pre_evt_cb(char *name, int argc, char **argv);(see below for an explanation of the arguments) and you must ``register'' it with LDDL by making a call like this:
lddl_set_pre_evt_func(pre_evt_cb);
Event callback
So, in our example,
callback #2 (after identifying it, but before processing it) would apply
after we have determined whether it is a ``real'' or ``calibration'' event,
but before we have examined any of its nodes.
To use this callback, your function must be of the form
int evt_cb(char *name, int argc, char **argv);(see below for an explanation of the arguments) and you must ``register'' it with LDDL by making a call like this:
lddl_set_evt_func(evt_cb);
Post-event callback
So, in our example,
callback #3 (after processing the event) applies immediately after we are
done dealing with the entire event.
To use this callback, your function must be of the form
int post_evt_cb(char *name, int argc, char **argv);(see below for an explanation of the arguments) and you must ``register'' it with LDDL by making a call like this:
lddl_set_post_evt_func(post_evt_cb);
Group callback
So, in our example,
callback #4 (start of group) applies at line 14, when we encounter the
``pha'' group node.
To use this callback, your function must be of the form
int group_cb(int argc, char **argv, char *group_name, char *evt_name);(see below for an explanation of the arguments) and you must ``register'' it with LDDL by making a call like this:
lddl_set_group_func(group_cb);
Post-group callback
So, in our example,
callback #5 (end of group) applies after we have processed all the node of
the ``pha'' group.
To use this callback, your function must be of the form
int post_group_cb(int argc, char **argv, char *group_name, char *evt_name);(see below for an explanation of the arguments) and you must ``register'' it with LDDL by making a call like this:
lddl_set_post_group_func(group_cb);
Node callback
So, in our example,
callback #6 (other nodes) applies for every other node, including all the
elements in the ``pha'' group, as well as the ``real'' and ``calibration''
events.
To use this callback, your function must be of the form
int node_cb(char *inp, char *swapp, int type, int count, int nbytes, int argc, char **argv, int native, char *group_name, char *evt_name);(see below for an explanation of the arguments) and you must ``register'' it with LDDL by making a call like this:
lddl_set_node_func(node_cb);
Fatal error callback
The default action taken when encountering a fatal error is to print an
error message on the stderr and exit. An application can modify this
default action by supplying an error-handling function.
To use this callback, your function must be of the form
void err_cb(char *str);The parameter str is a pointer the error message. The callback may choose to take some action, continue processing, or exit. It must be ``registered'' with LDDL by making a call like this:
lddl_set_err_func(err_cb);
New input file callback
If the application needs to know which of several input files is being
processed, it can use a callback routine. LDDL will call the
routine whenever it starts a new input file.
To use this callback, your function must be of the form
void new_file_cb(char *str);The parameter str is a pointer to the file name. The callback may do whatever is required with the file name. It must be ``registered'' with LDDL by making a call like this:
lddl_set_new_file_func(new_file_cb);
All of the callbacks (except for the error and new file functions) take the following parameters:
int argc | number of ``command line'' arguments in argv | |
char **argv | argument vector | |
int native | if nonzero, indicates that the unswapped data is in the native endianness |
In addition, the group callbacks include:
char *group_name | name of this group | |
char *evt_name | name of the event this node is in |
The node callbacks contain:
char *inp | pointer to the unswapped data | |
char *outp | pointer to the swapped data | |
int type | data type of this node (see Type macros below) | |
int nbytes | number of bytes that this node uses | |
int count | number of elements in this node | |
char *group_name | name of the group this node is in (if any) | |
char *event_name | name of the event this node is in |
The evt callbacks contain:
char *name | name of this event |
The following macros should be used by application programs to determine the type of a data item.
Macro | Type | |||
---|---|---|---|---|
LDDL_INT8 | char | |||
LDDL_INT16 | short | |||
LDDL_INT32 | long | |||
LDDL_INT64 | long long | Note: not yet supported | ||
LDDL_FLOAT32 | float | |||
LDDL_FLOAT64 | double | |||
LDDL_GROUP | group |
The application can set a couple of LDDL's internal parameters to change its behavior slightly.
Setting the endianness of the input data.
If there is no endianness indicator in the data, then it will be necessary to inform LDDL whether the data is in its native endianness or not. This is done using the function
void lddl_set_native(int n);If the parameter n is 1, then LDDL treats the data as being in the machines native format. If n is 0, then the data is assumed to be in the opposite endianness.
Searching for start of events.
Usually, LDDL assumes that the data is properly formatted and that a new event will start where expected. If this is not the case, the application may tell LDDL to search the data for the pattern identifying the start of an event. In this mode, it will step through the data byte-by-byte and try to match the start of an event at each location. Bytes stepped over are ignored.
This byte-by-byte mode is set using the function
void lddl_search_for_evt_start(int n);If the parameter n is 1, then LDDL goes into byte-by-byte search mode. If n is 0, then it uses the standard mode.
In standard mode, not matching an event where expected is considered a fatal error.
Returning modified or original values
If the format file specifies bitwise operations
on a node, this modifies the original value of the data item. Your
application can choose to receive, via the node callback, either the
original data or the modified data by using the following routine. The
default is to receive the modified values. To change this:
int lddl_return_modified_value(int yesno)If yesno is 1, then the node callback will return the modified values. If yesno is 0, then the original value is returned by the node callback. lddl_return_modified_value() itself returns the previous mode.
Returning duplicate nodes
If the format specified
multiple processing of each node,
you can choose whether the node callback function will to receive all
multiple nodes or just the last one. The default is to receive all nodes.
To change this, use the following:
int lddl_return_duplicate_node(int yesno)If yesno is 1, then the node callback will return all of the multiple nodes. If yesno is 0, then only the last of the nodes is returned by the node callback. lddl_return_duplicate_node() itself returns the previous mode.
Setting lddl's read buffer size
LDDL normally reads in chunks of 32768 bytes. This value can be changed
using the following function:
void lddl_set_buf_size(unsigned long val);It sets the buffering to val bytes.
After defining all the callbacks and setting any other LDDL parameters, it is time to turn control over to the LDDL library. This is done by making a call to lddl_start(), which will read the data format file and all the input data, making the callbacks registered by the application as it parses the data into events, groups, and nodes. After this function returns, all the data has been processed and the application program can go on to other things or exit.
void lddl_start(char *fmtfile, int ninfiles, char **infilev, char *progname) fmtfile - file containing the data format description. ninfiles - number of input data files to read. infilev - pointers to the names of the input data files. programe - the name of the application program.
If it is necessary to know which version of LDDL has been linked with the application, a call to the following function will return a pointer to a string identifying the version:
char *lddl_version(void);