querybuilder

Prequisites for derived products are typically described using a query-function. This function describes which relations should exist between components of a new (i.e. to-be-built) derived product in an abstract way. These relations do not describe a fact which must hold for a single database object as it is usual for normal database queries, but they describe facts which must hold for a combination of multiple such objects together as one group. The query builder is responsible for the transformation of a query function into multiple single-object queries for the metaStorage.

how does a query function look like?

The query function for a calibrated image looks as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
minimalDarkFrameCount = 15

def query(rawImage, darkBefore, darkAfter, calibrationData):
    rawImage.productType == 'raw_image_data'
    rawImage.data_type == 'scan'
    darkBefore.productType == 'averaged_dark_data'
    darkBefore.frameCount >= minimalDarkFrameCount
    darkAfter.productType == 'averaged_dark_data'
    darkAfter.frameCount >= minimalDarkFrameCount
    calibrationData.productType == 'image_calibration_data'

    rawImage.integrationTime == darkBefore.integrationTime
    rawImage.sensorId == darkBefore.sensorId
    rawImage.integrationTime == darkAfter.integrationTime
    rawImage.sensorId == darkAfter.sensorId
    rawImage.sensorId == calibrationData.sensorId
    rawImage.startTime >= calibrationData.validFrom
    rawImage.stopTime <= calibrationData.validUntil
    rawImage.userMeta.optics == calibrationData.optics

    rawImage.dataSize.spatial == calibrationData.dataSize.spatial
    rawImage.dataSize.spectral == calibrationData.dataSize.spectral
    rawImage.dataSize.spatial == darkBefore.dataSize.spatial
    rawImage.dataSize.spectral == darkBefore.dataSize.spectral
    rawImage.dataSize.spatial == darkAfter.dataSize.spatial
    rawImage.dataSize.spectral == darkAfter.dataSize.spectral

    rawImage.date.enclosedBy(darkBefore.date, darkAfter.date)

This clearly looks odd. There is no return-statement, so this function does nothing obvious. Also almost all statements are comparisions and nothing is done with their results.

And how can it be understood?

The arguments of the query function determine how many components are needed. They can be freely chosen but are all interpreted as something representing a product. In the above case, the agruments state:

  • There are 4 components needed.
  • Their names are: rawImage, darkBefore, darkAfter, calibrationData.

The body of the query function states relations which must host between the components or between a component and some external value (which is assumed to be constant). In this example the highlighted lines state the following:

  • Line 4: The productType of rawImage must be “raw_image_data”.
  • Line 7: The frameCount of darkBefore must be at least 15.
  • Line 17: The startTime of rawImage must be equal or later that the start of the validity period of the calibrationData.

All these constraints are interpreted in a way that every comparision or statement must be True in order to find a valid combination of components. The most interesting statements are of course the relations between two or more products.

How does it work?

The querybuilder can be called like:

import runmacs.processor.querybuilder as qb
components, pg = qb.processQuery(query)

processQuery uses introspection to find the name of all components (returned as components List) and runs the query function with some dummy arguments, collecting all the information contained in the specified relations. Afterwards, pg can be used to generate the most specific metaStorage-query for any given component name, using all currently available information:

>>> generated_query = pg.getQuery('rawImage')
>>> generated_query
{'data_type': 'scan', 'productType': 'raw_image_data'}
>>> generated_query = pg.getQuery('rawImage', {"calibrationData": {
                                                     "validFrom": datetime.datetime(2014,1,1),
                                                     "validUntil": datetime.datetime(2015,1,1),
                                                     "sensorId": 10,
                                                     "optics": "baffle",
                                                     "dataSize": {"spatial": 50, "spectral": 30}}})
>>> generated_query
{'dataSize.spatial': 50,
 'dataSize.spectral': 30,
 'data_type': 'scan',
 'productType': 'raw_image_data',
 'sensorId': 10,
 'startTime': {'$gte': datetime.datetime(2014, 1, 1, 0, 0)},
 'stopTime': {'$lte': datetime.datetime(2015, 1, 1, 0, 0)},
 'userMeta.optics': 'baffle'}

So if nothing is known about other products, only the constant parameters can be used for a query. However, if some raw data matching some given calibration data is needed, more restrictions can be stated. Running this mechanism iteratively allows to quickly find matching combinations for a new derived product.