Overview of the Data System

written by Joey Mukherjee -
joey@swri.edu


Table of contents

  1. Overview
  2. Archive Sites
  3. Promotion of Data
  4. Data / Metadata / Database Entries
  5. SDDAS File Types

Overview

A major strength of SDDAS comes from its distributed data system. Data can be kept almost anywhere and brought to you when you need a particular data set. A site that stores data is known as an archive site. Currently, there are only a few archive sites, but any site that so desires may be an archive site.

Data is brought to the user via promotion. Promotion will contact the archive site, request the data set, and then return a status back to the calling program. The process is completely transparent to the end user (save the message box upon promotion).

So how does the program know what data to get? The answer is through the meta data or database entries. Conveniently, the meta data can be promoted as well. On a virginal system, there will be no data and no meta data. When the data is needed, the meta data will first be promoted and then the data.

The data is actually a collection of data files needed for a single plot. There are four types of data files, all of which are necessary in plotting. When promoting, they should all be promoted.

Archive Sites

An archive site is a site that stores data for other people to use. At SwRI, we have two systems setup as archive machines. The main one is pemrac.space.swri.edu at 129.162.155.101.

So what does it mean to be "setup as an archive machine"? The first condition is to run the UNIX daemon sd_rshd. When sd_rshd is running, requests can come in from the outside and it is the archive sites responsibility to find the data that is requested and send it to the remote site.

For example, on pemrac, we have alot of data on physical hard disk, and even more data on a magneto-optical jukebox. When a request comes in, we check the archive label which is sent to the daemon as part of the request. Depending on the archive label, we either look on the hard disk, magneto-optical jukebox, or CD ROM jukebox. Once we know where to look, we go to that file and send the requested file over to the remote site.

It sounds simple, and in reality, it is simple; but there are several things that can go wrong sometimes. First of all, the jukebox may or may not have the particular platter loaded. If the data is not online, there is no solution but to mail us and ask us to put the platter on line. Another potential snafu is the meta data is incorrect. This is somewhat easier to fix (just repromote the meta data), but is still a problem nonetheless. Another unavoidable problem is hardware failure. The jukebox may just be on the fritz and this is just a harsh reality of working with computers!

Promotion Of Data

Promotion of data is the act of bringing the data from the remote site to your local site. Promotion is a great feature as one need only get the data they need and they get it when they want it. Unfortunately, on promotion, the burden is essentially on the server side to work correctly. Generally when promotion fails, it is due to the server not being able to find its data.

If you have problems promoting, one thing to try is repromoting the meta data. Possibly, the database has changed so repromoting the meta data will essentially make the promotion process look for the data differently.

Data / Meta Data / Database Entries

Data (to be used) is kept in the local area of your hard drive. It will be promoted from the archive site and stored in the location defined by the environment variable SDDAS_DATA. Data is stored in the hierarchy defined by Project / Mission / Experiment / Instrument /. In other words, each name is a different subdirectory. The filenames will all start with the virtual instrument name and then, depending on the file type, a different suffix.

Meta data (database entries is another name for it) is the data about data. Database entries are stored on the local system under the instrument level of the hierarchy in a directory called Database. Inside that directory are two DBF (or database) files and two NDX files. The NDX files are simply index files which tell the software how to quickly find entries in the DBF files which actually store the data.

SDDAS File Types

SDDAS has four different file types which comprise the data. These are :

  1. VIDF (V/I file)
  2. Header
  3. Actual Data
  4. PIDF
The VIDF is a "virtual instrument description file". It is created by the scientist describing in mathematical terms the layout of the data and within this file are also tables which can be applied to the raw telemetry to produce data in the given units. If the data looks odd when plotting, there is a good chance that a value in the VIDF is incorrect. The VIDF file is a text file, but must be compiled into a binary format before the software can make use of this. This process is done with a program called mk_idf. Typically this is handled by the promotion software and is usually automatic.

The header file describes the data file to the software. It is created by the software which creates the IDFS data files and is promoted along with the data file. The header file contains data which is rarely varies in time and does not need to be duplicated for every data record.

The data file is a set of fixed length records which contain all of the virtual instrument data not in the header file.

The PIDF is a "plot instrument definition file". It is not needed by the generic software for reading/using data, but is very necessary for the applications to plot the data correctly. A PIDF is a text file which can be created/edited with the PIDF editor. These are also promotable through the applications and the data catalog.

If you are interested in creating IDFS files, read Creating IDFS files. It will give you a simplified walkthrough of how to do it.