improvement logic

Note

Improvement logic is not implemented yet, so this describes an intended new feature.

introduction

Sometimes, the need for a change of an existing product arises, however due to the need for reproduceability, a product which has once been created shoud never been deleted again. In terms of storage, this will not be a problem, because products only store the building plan and not the actual data. On the other hand, a huge ammount of outdated products will distract the user from using the best available data. The resulting goal of this dilemma is to find a system, which allows to have the building plans for lots of outdated data still in the database, so the data can be reproduced if explicitly requested. However, a user browsing through all datasets should only see the newest version of each product by default and a user requesting old products should be reminded, that a newer version is available.

It turns out that the definition of “newest” is not as trivial as it sounds in the first place as illustrated by this example: Maybe one finds that the cause of some errorneous data is a wrongly measured darkcurrent signal. A resolution might be to use a slightly older, but correctly measured version instead. In this case “newer” is actually a little bit older.

To solve this problem, we do not search for newer datasets but “better” ones. This is of course just a change in nomenclature and no new functionality is gained through renaming things. However, it is possible to define a better version of the mentioned darkcurrent signal upon discovery explicitly. Now the system can deduce from this definition, that for any derived product, derived from a product which has an associated “better” version of itself, there must be a better version as well which has the bad component replaced by the better one:

digraph improvement_idea {
subgraph input {
   "raw";
   "dark1";
   "dark2";
   graph [rank=same];
}

subgraph output {
   "calibrated1";
   "calibrated2";
   graph [rank=same];
}

"dark1" [shape="folder"];
"dark2" [shape="folder"];
"raw" [shape="folder"];

"calibrated1" [shape="rect"];
"calibrated2" [shape="rect"];

"raw" -> "calibrated1";
"dark1" -> "calibrated1";
"raw" -> "calibrated2";
"dark1" -> "dark2" [label="better", color="green"];
"dark2" -> "calibrated2";
"calibrated1" -> "calibrated2" [label="better", color="green"];
}

This graph illustrates the idea from above: dark1 has the improvement dark2 and subsequently, calibrated1 will get an improved version calibrated2. The raw data used in this example in unaffected by the improvement and thus stays the same for both variants of the calibrated data.

The green arrows representing a link to the “better” version of a product can be stored in the metaStorage and can be used both, to hide products with outgoing arrows because they are not the “best” version and to direct the user to a better version, if the bad version is still beeing used.

definitions

better relation
Relation describing that some product is better than another. If the relation is illustrated by an arrow, the product at the tip is the better one.
improvement
The explicitly stated fact that one product is better than another. An improvement has an associated unique id (indicated by slightly different colors). This statement results in a better relation. As better relations are propagated, not every better relation implies an (explicitly stated) improvement.

Note

In the following graphs, improvements and better relations are not differentiated, every better relation originating from an imported product (indicated with folder shape) is also an improvement.

rules for improvements

Every product may only have zero or one improved version of itself. Otherwise, it would be undecideable which suggestion should be made to the user:

digraph invalid_improvements {
subgraph output {
   "dark2";
   "dark3";
   graph [rank=same];
}

"dark1" [shape="folder"];
"dark2" [shape="folder"];
"dark3" [shape="folder"];

"dark1" -> "dark2" [label="better", color="green"];
"dark1" -> "dark3" [label="better", color="green"];
"dark2" -> "dark3" [label="which one??", color="red", dir="both"];
}

For the other direction, there is no such restriction, the following relation would be perfectly fine:

digraph one_for_two {
subgraph input {
   "dark1";
   "dark2";
   graph [rank=same];
}

"dark1" [shape="folder"];
"dark2" [shape="folder"];
"dark3" [shape="folder"];

"dark1" -> "dark3" [label="better", color="green"];
"dark2" -> "dark3" [label="better", color="green"];
}

If there is a second improvement for some product, this can be modeled by a sequence of repetitive improvements. This way, no ambiguity arises:

digraph two_improvements {
"dark1" [shape="folder"];
"dark2" [shape="folder"];
"dark3" [shape="folder"];

"dark1" -> "dark2" [label="better", color="green"];
"dark2" -> "dark3" [label="better", color="green"];
}

propagation of better relations

Improvements are stated for imported products, indicating that there is another imported product of better quality. Using the following rule, better relations can be propagated from one product to all products derived from it, eventually finding the best version of every product:

  • Take a product P with at least one outgoing better relation.
  • Follow all “black arrows” to the children C of P.
  • For each better relation:
    • Follow the better relation to find product P’.
    • For each child product C:
      • Get all components of C.
      • Create a new product C’ with the same components, except P is replaced by P’.

This algorithm yields to the same graph as shown in the introduction:

digraph improvement_idea {
subgraph input {
   "raw";
   "dark1";
   "dark2";
   graph [rank=same];
}

subgraph output {
   "calibrated1";
   "calibrated2";
   graph [rank=same];
}

"dark1" [shape="folder"];
"dark2" [shape="folder"];
"raw" [shape="folder"];

"calibrated1" [shape="rect"];
"calibrated2" [shape="rect"];

"raw" -> "calibrated1";
"dark1" -> "calibrated1";
"raw" -> "calibrated2";
"dark1" -> "dark2" [label="better", color="green"];
"dark2" -> "calibrated2";
"calibrated1" -> "calibrated2" [label="better", color="green"];
}

Note that this rule is local i.e. its complexity scales with the number of relations of a single product. Thus it is not needed to consult every product in the database to find a result.

divergence on two improvements

As suggested by the algorithm, there is still a situation in which more than one outgoing better relation can arise naturally:

digraph diamond_problem {
subgraph input1 {
   "dark1";
   "calA";
   graph [rank=same];
}

subgraph input2 {
   "dark2";
   "calB";
   graph [rank=same];
}

subgraph calibrated2 {
   "calibrated2A";
   "calibrated1B";
   graph [rank=same];
}

"dark1" [shape="folder"];
"dark2" [shape="folder"];
"calA" [shape="folder"];
"calB" [shape="folder"];
"calibrated1A" [shape="rect"];
"calibrated2A" [shape="rect"];
"calibrated1B" [shape="rect"];

"dark1" -> "calibrated1A";
"calA" -> "calibrated1A";
"dark2" -> "calibrated2A";
"calA" -> "calibrated2A";

"dark1" -> "calibrated1B";
"calB" -> "calibrated1B";

"dark1" -> "dark2" [label="better", color="green"];
"calA" -> "calB" [label="better", color="limegreen"];
"calibrated1A" -> "calibrated2A" [color="green"];
"calibrated1A" -> "calibrated1B" [color="limegreen"];
"calibrated2A" -> "calibrated1B" [label="??", color="red", dir="both"];
}

The problem in this case is that a derived product can have multiple components, each of which can have one better relation associated to it. In this case, it is not possible to tell wether calibrated1A or calibrated1B is betther than the other.

But this graph is not fully evaluated. For instance, dark2 is better than dark1 but there is no better version of calibrated1B visible, which includes dark2. Fully applying the rules yields the following graph:

digraph diamond_problem {
subgraph input1 {
   "dark1";
   "calA";
   graph [rank=same];
}

subgraph input2 {
   "dark2";
   "calB";
   graph [rank=same];
}

subgraph calibrated2 {
   "calibrated2A";
   "calibrated1B";
   graph [rank=same];
}

subgraph calibrated3 {
   "calibrated2AB";
   "calibrated1B2";
   graph [rank=same];
}

"dark1" [shape="folder"];
"dark2" [shape="folder"];
"calA" [shape="folder"];
"calB" [shape="folder"];
"calibrated1A" [shape="rect"];
"calibrated2A" [shape="rect"];
"calibrated1B" [shape="rect"];
"calibrated2AB" [shape="rect"];
"calibrated1B2" [shape="rect"];

"dark1" -> "calibrated1A";
"calA" -> "calibrated1A";
"dark2" -> "calibrated2A";
"calA" -> "calibrated2A";

"dark1" -> "calibrated1B";
"calB" -> "calibrated1B";
"dark2" -> "calibrated1B2";
"dark2" -> "calibrated2AB";
"calB" -> "calibrated1B2";
"calB" -> "calibrated2AB";

"dark1" -> "dark2" [label="better", color="green"];
"calA" -> "calB" [label="better", color="limegreen"];
"calibrated1A" -> "calibrated2A" [color="green"];
"calibrated1B" -> "calibrated1B2" [color="green"];
"calibrated2A" -> "calibrated1B" [label="??", color="red", dir="both"];
"calibrated1A" -> "calibrated1B" [color="limegreen"];
"calibrated2A" -> "calibrated2AB" [color="limegreen"];
"calibrated2AB" -> "calibrated1B2" [label="??", color="red", dir="both"];
}

Which looks even worse than the graph before and does not solve the problem. In this case there exist two different better products of calibrated1A which each have both improvements applied, but in reversed order. Our intuition however tells of course that there should be only one correct version which has both improvements applied. This is only possible, if chaining of better relations is commutative. This allows to identify both calibrated2AB and calibrated1B2 as the same product calibrated2B yielding the following graph:

digraph diamond_problem {
subgraph input1 {
   "dark1";
   "calA";
   graph [rank=same];
}

subgraph input2 {
   "dark2";
   "calB";
   graph [rank=same];
}

subgraph calibrated2 {
   "calibrated2A";
   "calibrated1B";
   graph [rank=same];
}

"dark1" [shape="folder"];
"dark2" [shape="folder"];
"calA" [shape="folder"];
"calB" [shape="folder"];
"calibrated1A" [shape="rect"];
"calibrated2A" [shape="rect"];
"calibrated1B" [shape="rect"];
"calibrated2B" [shape="rect"];

"dark1" -> "calibrated1A";
"calA" -> "calibrated1A";
"dark2" -> "calibrated2A";
"calA" -> "calibrated2A";

"dark1" -> "calibrated1B";
"calB" -> "calibrated1B";
"dark2" -> "calibrated2B";
"calB" -> "calibrated2B";

"dark1" -> "dark2" [label="better", color="green"];
"calA" -> "calB" [label="better", color="limegreen"];
"calibrated1A" -> "calibrated2A" [color="green"];
"calibrated1B" -> "calibrated2B" [color="green"];
"calibrated2A" -> "calibrated1B" [label="??", color="red", dir="both"];
"calibrated1A" -> "calibrated1B" [color="limegreen"];
"calibrated2A" -> "calibrated2B" [color="limegreen"];
}

This is much better and in fact represents very well the natural way of handling this issue. If one improves two ingredients at distinct times, there will be some intermediate products, which only have one of the improvements applied and no one can tell which one is better. Applying both improvements in turn is certainly better than either of both and transitively also better than the original one. Subsequently it is in general not possible to find exactly one product which is one step better than an arbitrary product. But it is possible to find exactly one best version for any arbitrary product.

rules for better relation

After the observations above, the better relation and the chaining of better relations must follow some rules:

  • Better relations must be transitive upon chaining.
  • Chaining of better relations must be commutative for any set of relations originating from the same product.

The rule that only zero or one improvement may be stated per imported product, follows directly from the commutativity constraint. It would not be possible to ensure commutativity for e.g. two successive rotation corrections for a camera mount. Disallowing multiple improvements per imported product thus ensures comutativity trivially for better relations from imported products.

For better relations originating from derived products, commutativity can be ensured, as multiple better relations always originate in improvements of different imported products and thus correspond to different components. Exchanging different components is commutative and thus the chaining of the corresponding better relations is commutative as well.

more blowups

What happens with the seemingly simple situation where one imported product a is used by two different derived products f and g, which are both used by h:

digraph multi_changes {
"a" [shape="folder"];
"f" [shape="rect"];
"g" [shape="rect"];
"h" [shape="rect"]

"a" -> "f";
"a" -> "g";
"f" -> "h";
"g" -> "h";
}

If we improve a to b?

digraph multi_changes {
"a" [shape="folder"];
"b" [shape="folder"];
"f" [shape="rect"];
"f2" [shape="rect"];
"g" [shape="rect"];
"g2" [shape="rect"];
"h" [shape="rect"]
"hf2" [shape="rect"]
"hg2" [shape="rect"]
"h2" [shape="rect"]

"a" -> "f";
"b" -> "f2";
"a" -> "g";
"b" -> "g2";
"f" -> "h";
"g" -> "h";
"f2" -> "hf2";
"g" -> "hf2";
"g2" -> "hg2";
"f" -> "hg2";
"f2" -> "h2";
"g2" -> "h2";

"a" -> "b" [label="better", color="green"];
"f" -> "f2" [color="green"];
"g" -> "g2" [color="green"];
"h" -> "hf2" [color="green"];
"h" -> "hg2" [color="green"];
"hf2" -> "h2" [color="green"];
"hg2" -> "h2" [color="green"];
}

We actually get 4 versions of h! Reasonably, there should only be two versions, as it completely waseteful to apply an improvement only to f but not to g or vice versa. If another improvement from b to c is added, there would even be 9 versions of h: no chance to scale to 100000 products and above.

On the other hand, it is not trivial to avoid this situation, if not the entire graph should be used to find new products (which would have aweful performance). So a reasonably local situation is desired, which leads to the correct result. This solution must be able to cover partially evaluated graphs, because no specific order in which products are added to the graph should be enforced and the same result should be found if the graph is recreated at a later state. This can be illustrated in the following two situations of partially evaluated graphs:

digraph multi_changes {
"a" [shape="folder"];
"b" [shape="folder"];
"f" [shape="rect"];
"g" [shape="rect"];
"g2" [shape="rect"];
"h" [shape="rect"]

"a" -> "f";
"a" -> "g";
"b" -> "g2";
"f" -> "h";
"g" -> "h";

"a" -> "b" [label="better", color="green"];
}

In this case, f2 has not been found yet, however, the combination of g2 and f should not be considered to create a new h as this would be the same as the unwanted hg2 from above.

Note

The better relation from g to g2 has not yet been found, as creating g2 may only see b using current creation rules.

digraph multi_changes {
"a" [shape="folder"];
"b" [shape="folder"];
"c" [shape="folder"];
"f" [shape="rect"];
"f3" [shape="rect"];
"g" [shape="rect"];
"g3" [shape="rect"];
"h" [shape="rect"]
"h3" [shape="rect"]

"a" -> "f";
"a" -> "g";
"c" -> "f3";
"c" -> "g3";
"f" -> "h";
"g" -> "h";
"f3" -> "h3";
"g3" -> "h3";

"a" -> "b" [label="better", color="green"];
"b" -> "c" [color="limegreen"];
}

In this situation (which may arise if b and c are both in the initial graph), h2 should still be created, as it should also be created if c was missing. We actually need h2 as well, because the better relation from h to h3 should not go directly but across h2.

For all the above cases, the solution should eventually converge to:

digraph multi_changes {
"a" [shape="folder"];
"b" [shape="folder"];
"c" [shape="folder"];
"f" [shape="rect"];
"f2" [shape="rect"];
"f3" [shape="rect"];
"g" [shape="rect"];
"g2" [shape="rect"];
"g3" [shape="rect"];
"h" [shape="rect"]
"h2" [shape="rect"]
"h3" [shape="rect"]

"a" -> "f";
"a" -> "g";
"b" -> "f2";
"b" -> "g2";
"c" -> "f3";
"c" -> "g3";
"f" -> "h";
"g" -> "h";
"f2" -> "h2";
"g2" -> "h2";
"f3" -> "h3";
"g3" -> "h3";

"a" -> "b" [label="better", color="green"];
"b" -> "c" [color="limegreen"];
"f" -> "f2" [color="green"];
"g" -> "g2" [color="green"];
"h" -> "h2" [color="green"];
"f2" -> "f3" [color="limegreen"];
"g2" -> "g3" [color="limegreen"];
"h2" -> "h3" [color="limegreen"];
}

The key for the solution must be to inhibit the creation of products for some cases and the criterion for inhibition must be correctly decideable if only a small protion of the final graph is already evaluated.

An inhibition rule would be invoked just before the product would be saved to metaStorage. Therefore all components (i.e. black arrows) are known. Also the improvements up to all used imported products are known (really??). This can be used to state the following rules:

  • For each component, walk back through all subcomponents recursively until only imported products are left.
  • For each distinct imported product, walk backwards through better relations and collect their ids. The union of all ids are the “applied improvements”.
  • For each distinct imported product, walk forwards through better relations and collect their ids. The union of all ids are the “open improvements”.
  • The intersection between applied and open improvements is the set of incompletely applied improvements.
  • Products with incompletely applied improvements are rejected.

testing the rules

In this graph, there should be no h-like product created from g2 and f:

digraph multi_changes {
"a" [shape="folder"];
"b" [shape="folder"];
"f" [shape="rect"];
"g" [shape="rect"];
"g2" [shape="rect"];
"h" [shape="rect"]

"a" -> "f";
"a" -> "g";
"b" -> "g2";
"f" -> "h";
"g" -> "h";

"a" -> "b" [label="x", color="green"];
}

Walking back through g2 yields \([x]\) as applied and \([]\) as open improvements. Walking back through f yields \([]\) as applied and \([x]\) as open improvement. The intersection of applied and open improvements is: \(\left([x] \cup []\right) \cap \left([] \cup [x]\right) = [x]\), which is non-empty, so no product will be created.

In the following graph, there should be h created from f and g:

digraph multi_changes {
"a" [shape="folder"];
"b" [shape="folder"];
"f" [shape="rect"];
"f2" [shape="rect"];
"g" [shape="rect"];
"g2" [shape="rect"];
"h2" [shape="rect"]

"a" -> "f";
"a" -> "g";
"b" -> "f2";
"b" -> "g2";
"f2" -> "h2";
"g2" -> "h2";

"a" -> "b" [label="x", color="green"];
}

Walking back through f and g both yields \([]\) as applied and \([x]\) as open improvements. The intersection yields \([] \cap [x] = []\) which is empty, therefore h will be created.

collections

If imported products are arranged in collections and derived products are only allowed to be built from components, all from the same collection, the unwanted “hf2” and “hg2” can be detected as well and would not be created:

digraph multi_changes2 {
subgraph cluster_collection1 {
     "a";
     "f";
     "g";
     "h";
     color=black;
     label="collection 1";
     graph[style=dotted];
}

subgraph cluster_collection2 {
     "b";
     "f2";
     "g2";
     "h2";
     color=black;
     label="collection 2";
     graph[style=dotted];
}

"a" [shape="folder"];
"b" [shape="folder"];
"f" [shape="rect"];
"f2" [shape="rect"];
"g" [shape="rect"];
"g2" [shape="rect"];
"h" [shape="rect"]
"hf2" [shape="rect"]
"hg2" [shape="rect"]
"h2" [shape="rect"]

"a" -> "f";
"b" -> "f2";
"a" -> "g";
"b" -> "g2";
"f" -> "h";
"g" -> "h";
"f2" -> "hf2";
"g" -> "hf2";
"g2" -> "hg2";
"f" -> "hg2";
"f2" -> "h2";
"g2" -> "h2";

"a" -> "b" [label="better", color="green"];
"f" -> "f2" [color="green"];
"g" -> "g2" [color="green"];
"h" -> "hf2" [color="green"];
"h" -> "hg2" [color="green"];
"hf2" -> "h2" [color="green"];
"hg2" -> "h2" [color="green"];
}

Collections also would allow to prevent divergence of the first kind:

digraph diamond_problem {
node [shape="Mrecord"];

dark1 [label="<f0> dark1|<f1>1"];
dark2 [label="<f0> dark2|<f1>2,3"];

calA [label="<f0> calA|<f1>1,2"];
calB [label="<f0> calB|<f1>3"];
calibrated1A [label="<f0> calibrated1A|<f1>1"];
calibrated2A [label="<f0> calibrated2A|<f1>2"];
calibrated2B [label="<f0> calibrated2B|<f1>3"];
calibrated1B [label="<f0> calibrated1B|<f1>-", color="orangered"];

dark1:f0 -> calibrated1A:f0;
calA:f0 -> calibrated1A:f0;
dark2:f0 -> calibrated2A:f0;
calA:f0 -> calibrated2A:f0;

dark1:f0 -> calibrated1B:f0;
calB:f0 -> calibrated1B:f0;
dark2:f0 -> calibrated2B:f0;
calB:f0 -> calibrated2B:f0;

dark1:f0 -> dark2:f0 [label="better", color="green"];
calA:f0 -> calB:f0 [label="better", color="limegreen"];
calibrated1A:f0 -> "calibrated2A" [color="green"];
calibrated2A:f0 -> calibrated2B:f0 [color="limegreen"];
}

(This graph is drawn differently as overlapping clusters (i.e. collections) are not allowed in graphviz.)

Each product would have one or many collection ids associated to it. Derived products take the union of all component’s collection ids. If the union is empty, the product shall not be constructed. In this case, calibrated1B would not be constructed as it does not belong to any collection.

finding the best version

The usefulness of the system is determined by the following two operations:

  • For any given product, decide if it is the best version of itself.
  • For any given product, find the best version of itself.

These operations (escpecially the first) should be very efficient, as they have to be performed at almost every request to the database. These operations are now very easy using the following definitions:

best product

Every product which has no outgoing better relations is a “best” product.

finding the best

Follow every better relation recursively until no more better relations are found. Return only the “best” products out of the set of traversed products. There should be only one such product, if the rules above are met. Otherwise, the “best” is undecideable.