developer documentation¶
problem statement¶
The idea to build the macsServer was born after the observation that handling of specMACS raw data is very tedious and it is easy to accidentally do something wrong. To see the contents of a file, the user must first convert it to a different format (e.g. calibrate the data) and may only find afterwards that it was the wrong file. Furthermore the amount of data prohibits to waste space in duplicating the data and also makes it very hard to work with the data while beeing out of the office or colaborating with other researchers working at different places. If problems in the input data are detected, every user should be informed about any corrections in a reliable way.
These observations led to the following goals for a data handling system:
goals¶
- All steps and source files involved to create a dataset must be documented and reproducible.
- All datasets must be searchable, sortable and have a fast visual preview.
- All data must be accessible requiring only an (not necessarily fast) internet connection.
- As few as possible non-raw data should be stored to disk. Data should be calculated on-demand instead.
- The datasets should look like they are already calculated. This is needed as the user should not worry aboud how to calculate “standard” products every time.
- It should be possible to integrate data from different sensors, allowing to create joint products.
- It should be possible to correct data in any kind of input data. This correction should be explicitly stated, such that everyone can check if a certain correction has been applied to some data or not.
- Any changes in raw data should be propagated to derived datasets, such that it is possible to check if a dataset beeing used is the best available dataset and if not, which dataset would be the corresponding best dataset.
- A user management should allow to restrict access based on dataset metadata.
- The system should not add any relevant state or information, other than explicitly stated in the input data. This separation is needed, because otherwise all internal state must be backuped, which adds unnecessary complexity.
realization sketch¶
The macsServer builds a graph of products. This graph shows all computeable products and can subsequently be searched to find interesting products. The graph thereby states the required steps to calculate a product and the starting points of an arrow pointing to a product shows the required input data.
The graph can be built automatically, using rules defined in the products, allowing to browse through it quickly. The graph can be inverted by the macsserver and used to retrieve the contained data lazily using OPeNDAP.