Little Data Description Language (LDDL) version 1.5

Marty Olevitch

Laboratory for Experimental Astrophysics (LEXAS)
Physics Dept, Washington University in St. Louis
St. Louis, MO 63130 USA
marty@cosray2.wustl.edu

Introduction
Assumptions regarding the input data
Specifying the data format
Simple data formats
More complex data type specifiers
Converting a C struct into the data spec format
Automatic recognition of byte order (Endianness)
Grouping blocks of data items.
Data with more than one kind of event
- Specifying the event signatures
- Defining the event
Application-specific commands
Bitwise operations on a data node
Multiple operations on a single data node
Some sample data format files
Some applications
- Dataswap
- Dataprint

Programmer information

Callback functions
Set-up routines
Turning control over to LDDL).
Identifying the LDDL version.

The Little Data Description Language (LDDL) is a simple language for describing the format of a stream of data. It won't handle every possible data format, but it does seem to handle the ones we deal with in our various experiments, at least the data stream from the hardware.

Assumptions regarding the input data

The data is considered to be composed of individual events. In the simplest data formats, each event is identical. In more complex data formats, there may be several different types of events. The LDDL language allows these various data formats to be specified.

The building blocks of any data format consist of

integer quantities (signed or unsigned)
- char (8 bits)
- short (16 bits)
- long (32 bits)
floating point quantities
- float (32 bits)
- double (64 bits)

The current version of LDDL doesn't support 64-bit longs.

Specifying the data format

The user of LDDL must write (or have someone else write) a file containing a description of the data. What follows is a tutorial-like introduction to specifying the data format, with some examples.

Simple data formats

In these formats, each data event consists of a sequence of longs, shorts, chars, floats, and/or doubles. There is only one type of event in the data stream and it always has the same size. The data spec file that describes such a format consists of a sequence of lines of the form

    <type> [quantity]

where <type> consists of one of the following capital letters:

L denotes a 32-bit integer (long)
S denotes a 16-bit integer (short)
B denotes an 8-bit integer (char)
D denotes a 64-bit floating point (double)
F denotes a 32-bit floating point (float)

and [quantity] is the number of such entities. [quantity] may be omitted, in which case it defaults to a value of 1.

Comments may be inserted into the format description. They start with the # character and continue to the end of the line.

For example, if an event of our data consists of 3 longs, 2 shorts, a 4 x 5 arrays of chars, a long, and a short, 2 doubles and a float, then it could be represented like so:

    L 3
    S 2
    # a 4 x 5 array follows
    B 20
    L
    S
    D 2
    F

A real-life example of this type of format is the data for the multi-anode PMT calibration from a SIFTER accelerator run in 1998. Each event consisted of a short int start-of-event pattern, a short int that is always 1, a long int event number, and a long int time stamp, followed by 32 short ints of pha data. It can be described by the following specification:

    S = 0xA098     # ID pattern
    S = 1
    L              # evt number
    L              # time stamp (PC tick count)
    S 32           # pha values

On the first two lines above, we use the ``= val'' construct. This indicates that LDDL will check the value of the data item. If it is not as specified then a warning message will be printed.

More complex data type specifiers

Here are some examples of more complex data type lines.

<type> [<quantity>]
example: 5 long ints
L 5
<type> [<quantity] [= <value>] [optional]
example: 4 short ints, each with the value 0xABCD
S 4 = 0xabcd
example: 1 short int = 2304, but may or may not be present
S = 2304 optional
<type> [<quantity>] [= <value>] [mask <maskval>] [optional]
example: A long with value 0xF123 when anded with 0x0000FFFF
L = 0xf123 mask 0x0000ffff
<type> -> <varname>
example: store short value in a variable ``count'' for later use
S -> count
<type> <varname>
example: use variable ``vidsize'' to specify a quantity
B vidsize

Converting a C struct into the data spec format

Here is an example of converting a C structure into the format required in the data specification file. The following struct

    struct datum {
	short s[8];
	long l[4];
	double d[2];
	char c[8];
	float f[8];
    } D;

can be represented like so:

Automatic recognition of byte order (Endianness)

The library routines can determine whether the data is in the computer's native byte order if there is some sort of data item than can be used as an indicator. For example, if the first (or second or third, etc) word of the data always has a particular value, say 0x1234, then the library can check the value of this word. If it is in fact 0x1234, then the data is assumed to be in the native endianness. If it turns out to be 0x3412, then the data is in the opposite endianness.

This information is specified using the endian statement. For example:

    endian offset 2 type S = 0x90af

states that if the program finds a short integer at a 2 byte offset from the beginning of the input data containing the value 0x90af, then the data is in native byte order. The endian statement may also have a bitmask:

    endian offset 4 type L = 0xABCD0000 mask 0xFFFF0000

In this example, the value of the long integer at byte offset 4 must be 0xABCD0000 after masking with 0xFFFF0000.

The default value of offset is 0, and the default is not to mask:

    endian type S = 0x1234

Grouping blocks of data items.

A block of data type lines may be grouped together, named, and referenced by name later in the format file. For example:

    group blob
	S 2
	L
    group_end

A group begins with a line of the form

    group foo

where ``foo'' is the name of this group. It ends with a line consisting of the keyword ``group_end''.

The above example specifies that a blob consists of two shorts, followed by a long. It can then be used like so, for example:

    S = 0xffff
    S = 0xffbc
    S -> nb
    blob nb

The previous example expects the third short to contain the number of blob groups to expect. This value is placed in the variable nb. Then, in the last line, the value of nb is used to specify the number of blob groups to expect in the data.

Data with more than one kind of event.

If there is more than one kind of event in the data, then the data format must specify two things:

A signature, which is a way to distinguish each kind of event from the others, and
the description of the data items (and groups of data items) composing the event itself.

Specifying the event signatures

Each different kind of event in the data must have a unique signature. The signature is simply a specific value (or set of values) somewhere in the data that identify the event. For example, in the SOFT MSU 1995 data, a ``real'' event can be identified because the second word is 0xFFF9 and the third word is 0. A ``spill header'' event can be identified because its second word is 0xFFEE.

We provide this information the LDDL program by using a signature definition. This statement begins with the keyword sig. It specifies the following:

The name of this event kind.
A byte offset in the data (default 0) at which the distinguishing bit pattern can be found for this event kind.
The type of this data item (either B, S, or L).
The value it must equal for this event kind.
An optional bitmask to and the value with.

There may be more than one set of these conditions for each event kind. For example, here is a set of signatures for the SOFT MSU 1995 data:

    sig real       offset 2 type S = 0xfff9 && offset 4 type S = 0
    sig led_cal    offset 2 type S = 0xfff9 && offset 4 type S = 2
    sig reply      offset 2 type S = 0xfff9 && offset 4 type S = 4
    sig pha_only   offset 2 type S = 0xfff9 && offset 4 type S = 9
    sig spill_hdr  offset 2 type S = 0xffee

Notice that the spill_hdr has only one set of conditions, and the other event kinds have two (separated by the && token). In this example, for a ``reply'' event, LDDL will first check that the short int at an offset of 2 bytes is equal to 0xFFF9 and if so, then check that the short int at offset 4 bytes is 4.

Here is an example from the TIGER/ULDB data stream which uses the bitmask feature:

    sig real     offset 0 type S = 1 mask 0x000f
    sig pedcal   offset 0 type S = 2 mask 0x000f
    sig lightcal offset 0 type S = 3 mask 0x000f
    sig fasthsk  offset 0 type S = 4 mask 0x000f
    sig medhsk   offset 0 type S = 5 mask 0x000f
    sig slowhsk  offset 0 type S = 6 mask 0x000f
    sig vslowhsk offset 0 type S = 7 mask 0x000f
    sig reply    offset 0 type S = 8 mask 0x000f

In this example, for a "medhsk" event, LDDL will take the short int at offset 0, and it with the value 0x000F, and check that that value is equal to 5.

Defining the event

Once the signature of each event type has been specified, the event itself may be defined. Continuing with an excerpt from TIGER/ULDB, we would have something like:

    group header
	S = 0xbef0 mask 0xfff0
	L 2
    group_end

    group ender
	S	# checksum
	S = 0xabcd
    group_end

    evt real
	header
	L	# hazard clk
	S	# status wd
	S 147
	ender

    evt reply
	header
	S	# rfu
	B 82
	ender

As you can see, the event defintion begins with a line of the form

    evt foo

where ``foo'' is the name of this event. Following this line is a series of data type lines or group lines as described above.

Application-specific commands

Different applications can be built using the Little Data Description Language. Most will require additional information from the user. For example, dataprint is a program that uses LDDL to print the values of various data items. It will take the following format, and print the value of the third item:

    S = 0xABCD          # event ID
    S                   # number of items
    S : print "%d "     # event number - print it

In general, anything following the colon (:) separator will be passed to the application program. Each program is free to interpret the string in its own way. The comment at the end of the line is, of course, not seen by the application.

Bitwise operations on a data node

The data in a node may be manipulated to produce a new value. LDDL provides a bitwise and (& in C) as well as a right shift (>> in C). For example, suppose we have a 16-bit integer data node that consists of a 12-bit value followed by a 4-bit value. The 12-bit value could be extracted by using a description line like so:


S & 0xfff0 >> 4

Suppose the data value is 0xFACE. Using the above description, LDDL would first mask the 16-bit value with 0xfff0 (producing 0xFAC0), and then to shift the value right 4 bits (producing 0x0FAC). The new value can be used with other LDDL constructs:

L & 0xfff00000 >> 20 -> nparms

Note that by default these operations may modify the data value that will be presented to the application. An application can take measures to receive the original value or the modified value. See Returning modified or original values.

Multiple operations on a single data node

A single node may be processed multiple times. Why would you want to do this? Consider the example above. You might want to do something with the 12-bit value and then also with the 4-bit value. Use the curly brackets to specify multiple operations on a single node. Each bracket must be alone on a line:

{
    S & 0xfff0 >> 4 -> nwords
    S & 0x000f      -> seq
}

Note for application programmers: this construct causes LDDL to make two functions calls to your node callback. See Returning duplicate nodes.

Some sample data format files

PHA lab test data

This data was put together quickly to test some new experiment hardware in the lab. It is fairly simple, containing one kind of event (of variable size) but does contain one group and an endian statement.

# data format for pdtest program

endian type S = 0x90af

group pha
    S 8	        # phas
    S	        # S scaler
    S	        # SC scaler
    S	        # SC channel
    S	        # board id
group_end

S = 0x90af
S -> npha	# no. of pha boards
L		# evtno
L		# tick count
pha npha

SOFT MSU 1995

This is the data format for the accelerator calibration of the SOFT detector at MSU in 1995. It has a variety of event types. The video data is of variable size which differs from event to event.

# SOFT MSU 1995 data format

sig real	offset 2 type S = 0xfff9 && offset 4 type S = 0
sig led_cal	offset 2 type S = 0xfff9 && offset 4 type S = 2
sig reply	offset 2 type S = 0xfff9 && offset 4 type S = 4
sig pha_only	offset 2 type S = 0xfff9 && offset 4 type S = 9
sig spill_hdr	offset 2 type S = 0xffee

group std
	# "standard" stuff found at the beginning of most evts
	S = 0xffff
	S = 0xfff9
	S 2
	L 4
	L -> vidsize2	# size (bytes) of camera 2 data
	L
	L -> vidsize1	# size (bytes) of camera 1 data
group_end

group blob
	S 2	# x, y coord of centroid
	L	# blob intensity
group_end

group pha
	S = 0xffff
	S = 0xfff8
	S 11
group_end

group disc
	# discriminator stuff
	S = 0xffff
	S = 0xffe6
	B 12
	S 6
group_end

group video1
	# video data for camera 1
	S = 0xffff
	S = 0xffec
	B vidsize1	# vidsize1 was extracted in the std group
group_end

group video2
	# video data for camera 2
	S = 0xffff
	S = 0xffed
	B vidsize2	# vidsize2 was extracted in the std group
group_end

group blobs1
	# blob data for camera 1
	S = 0xffff
	S = 0xffbc
	S -> nblobs1
	blob nblobs1
group_end

group blobs2
	# blob data for camera 2
	S = 0xffff
	S = 0xffbc
	S -> nblobs2
	blob nblobs2
group_end

group ender
	S = 0xffff
	S = 0xfff2
group_end

evt spill_hdr
	S = 0xffff
	S = 0xffee
	L

evt real
	std
	pha
	disc
	video1
	video2
	S = 0xffff
	S = 0xffbc
	S -> nblobs1
	blob nblobs1
	S = 0xffff
	S = 0xffbc
	S -> nblobs2
	blob nblobs2
	ender
	B = 0xdd optional
	B = 0xdd optional
	B = 0xdd optional
	B = 0xdd optional

evt led_cal
	std
	pha
	disc
	video1
	video2
	S = 0xffff
	S = 0xffbc
	S -> nblobs1
	blob nblobs1
	S = 0xffff
	S = 0xffbc
	S -> nblobs2
	blob nblobs2
	ender
	B = 0xdd optional
	B = 0xdd optional
	B = 0xdd optional
	B = 0xdd optional

evt reply
	std
	B 124
	ender
	B = 0xdd optional
	B = 0xdd optional
	B = 0xdd optional
	B = 0xdd optional

evt pha_only
	std
	pha
	ender
	B = 0xdd optional
	B = 0xdd optional
	B = 0xdd optional
	B = 0xdd optional

TIGER/ULDB 2000

This is the data format for the TIGER/ULDB experiment scheduled to fly on a long-duration balloon in 2000 or 2001. There are a number of different event types, none with a variable size.

# TIGER ULDB 2000 data format
# Just science data, not including CCSDS packet header or event time tag

sig real	offset 0 type S = 1 mask 0x000f
sig pedcal	offset 0 type S = 2 mask 0x000f
sig lightcal	offset 0 type S = 3 mask 0x000f
sig fasthsk	offset 0 type S = 4 mask 0x000f
sig medhsk	offset 0 type S = 5 mask 0x000f
sig slowhsk	offset 0 type S = 6 mask 0x000f
sig vslowhsk	offset 0 type S = 7 mask 0x000f
sig reply	offset 0 type S = 8 mask 0x000f

group header
	S = 0xbef0 mask 0xfff0
	L 2
group_end

group ender
	S	# checksum
	S = 0xabcd
group_end

evt real
	header
	L	# hazard clk
	S	# status wd
	S 147
	ender

evt pedcal
	header
	L	# hazard clk
	S	# status wd
	S 147
	ender

evt lightcal
	header
	L	# hazard clk
	S	# status wd
	S 147
	ender

evt fasthsk
	header
	S	# hsk seq
	L 2	# live time, real time
	S 36
	ender

evt medhsk
	header
	S	# hsk seq
	S 70
	ender


evt slowhsk
	header
	S	# hsk seq
	S 38
	B 12
	S 11
	ender

evt vslowhsk
	header
	S	# hsk seq
	S 3
	B 2
	S 8
	B 6
	S 6
	ender

evt reply
	header
	S	# rfu
	B 82
	ender

Some applications

When we have a new experiment (with a differenet data stream), I always have to write (at least) two programs: one to swap bytes and words for transferring the data from the big-endian data acquisition system to the little-endian data analysis system, and another to output the data values in human-readable text for simple analysis at data acquisition time. Therefore, it should not be surprising that the first two LDDL applications take care of these two tasks.

Dataswap

Dataswap takes care of converting the data from big- to little-endian (or vice-versa). The format file is just straight LDDL with no application specific commands. It takes the following command line parameters:

dataswap -f fmtfile [-o outfile] [-n] [-N] [file ...]

`-f fmtfile`		file containing data format spec
`-o outfile`		output file (default stdout)
`-n`		data IS NOT in native endianness (overrides `endian` keyword)
`-N`		data IS in native endianness (overrides `endian` keyword)
`[file ...]`		list of input file names (default stdin)

Dataprint

Dataprint is used to print selected data items in text format. It defines one application-specific command: print.

The print command has the following parameters. Examples are given below.

print [-n numspec] [fmt]

`[-n numspec]`		range of numbers specifying which of several data items to print
`[fmt]`		printf format (see defaults below)

Specifying a range of values (numspec)

Dataprint uses the following notation for specifying a range of values. A range consists of two values separated by a dash: 10-37. An individual value can also be specified: 225 and these can be combined in a comma-separated list: 10-37,225 or 1,3,5,7-10,18-21,25. A C function that implements numspecs can be found in the files parsenum.c and parsenum.h in the LDDL distribution.

The default formats

are as follows:

type	format
B (char)	`%02x`
S (short)	`%04x`
L (long)	`%08x`
F (float)	`%0f`
D (double)	`%0f`

Examples

Here are some examples. This one prints the float value in the default format.

    F : print

It is also possible to specify a particular printf-style format:

    L : print "%05ld "

or even something like

    S : print "evtno %u "

This is probably not too useful, but it is possible:

    S : print "Hey there, buddy!\n\t\t=====> %d\n\n"

The following case has a sequence of 5 shorts:

    S 5 : print "%04x "

Given this, dataprint will print each of the 5 values using the %04x format. If you only want to print, for example, the first 3 values (see numspec above), just say:

    S 5 : print -n 0-2 "%04x "

    S 5 : print -n 0,1,2 "%04x "

In the following:

    group pha
    	S 8	# pha values
	S	# discriminator
	S	# channel ID
    group_end

    S = 0xABCD
    pha : print

we are asking to print the group ``pha''. When dataprint encounters this, it will print all the elements in the group (including any other groups that may be used inside the first group) in the printf format specified, which in this case is the default for each data item. In this case, it will print the 8 pha values, followed by the discriminator and channel ID. If you only want to print some of these items, then you should put the print statements on the elements inside the group:

    group pha
    	S 8 : print 0-3,6	# pha values
	S			# discriminator
	S   : print		# channel ID
    group_end

    S = 0xABCD
    pha

That will print phas 0 through 3 and 6 plus the channel ID. Note that a print command on a group will not override a print command inside the group:

    group pha
    	S 8 			# pha values
	S			# discriminator
	S   : print "%u "	# channel ID
    group_end

    S = 0xABCD
    pha : print "%05u "

In this case, the channel ID will use the format %u and all the other pha elements will be printed with the %05u format specified on the last line.

Programmer information

The Little Data Description Language is implemented as a library of C routines. This section describes the programming interface to the library. We will refer to the library as LDDL. As an aid to understanding this information, you might want to refer to the source code of the dataprint program in the file dataprint.c in the LDDL source distribution.

It is important to understand that LDDL models the data as a stream of events which are composed of nodes, which are in turn made up of either data items or groups of data items.

The main work of writing an application is in coding the callback functions. They must be ``registered'' with LDDL (see Callback functions).

Before LDDL begins its operations, the application can set a couple of parameters (see Set-up routines).

Finally, control is passed to LDDL using lddl_start()).

If necessary, an application can determine with which version of LDDL it was linked by using lddl_version().

Callback functions

Consider once again the following data description:

Example data format

    1.   group pha
    2.      S 8         # node consisting of 8 data items
    3.      S           # node of 1 data item
    4.      S           # node of 1 data item
    5.   group_end

    6.   sig real type S = 0xA000
    7.   sig pha  type S = 0xA001

    8.   evt real
    9.      S = 0xA000
    10.     S 5

    11.  evt calibration
    12.     S = 0xA001
    13.     S -> npha
    14.     pha npha    # node consisting of a group

LDDL allows the application programmer to provide callback functions that do the real work of the program. Callbacks can be executed (called back?) on the following conditions:

Before identifying the next event.
After identifying the event, but before processing it.
After processing the event.
On starting to process a group node.
After processing a group node.
On processing any other kind of node.
On encountering an otherwise fatal error.
When a new input file is processed.

None of the callbacks are required, unless you want to do anything useful.

Pre-event callback

So, in our example, callback #1 (before identifying the next event) would apply at the point in the data stream when we expect to start a new event, but we haven't yet identified whether it is a ``real'' or ``calibration'' event.