Note
This technote is not yet published.
This technote describes the design of the Metric and Specification system that the new validation framework will use.
1 Frossie’s design principles¶
- Metric should be defined independently of code. Client-side code should consume metric definitions., and so will server-side code.
- Implement as a thin shim over what will become a Butler put.
- UX for Metric developers: make it as easy as possible for a developer to define a new
Metric
andMeasurements
appear in SQUASH and be monitored.
2 The validate_base 2.0 data model¶
The validation framework provides a rich set of objects for developers to use. The core objects are metrics, specifications, measurements, and monitors. Their relationships are summarized as:
The full list of validation framework objects is:
- MetricRepo
- MetricSet
- Metric
- MeasurementSet
- Measurement
- Blob
- Job
- Provenance
- Specification
- Monitor
- MeasurementView
2.1 Afterburner¶
An afterburner
is a piece of code that is executed on the output of a Task
in order to calculate a Measurement
of a Metric
.
2.2 MetricSet¶
A MetricSet
is a collection of Metrics
and their associated metadata.
2.2.1 Attributes¶
- name -
jointcal_cfht_r
- eups package name of data
- directory within eups data package for dataset
- butler dataId or list of dataIds (optional)
- Metrics (list?)
2.3 Metric¶
A Metric
is a quantity that is measureable.
2.3.1 Prior art¶
lsst.validate.base.Metric
Metric.yaml format
2.3.2 Attributes¶
name
description
unit
tags
reference.doc
reference.page
reference.url
2.3.3 Questions & Notes¶
Specifications
are no longer contained byMetrics
.- In the existing
lsst.validate.base.Metric
, there is a parameters dictionary that defines constants for the Measurement code. For example, the annulus diameter from AMx Metrics. These parameters will be contained in theSpecifications
. - We talked about making the minimum
Provenance
required for aMeasurement
/Job
being defined in theMetric
. Is this still a requirement?
2.4 MeasurementSet¶
A MeasurementSet
is a collection of measurements, and their associated metadata.
2.4.1 Attributes¶
_
* name
— Name of the MetricSet
that these Measurements
are associated with (e.g. validate_drp
)
* job_id
— Job identifier
* provenance
— provenance of the executed job (TODO: should this actually live in the job itself?)
* measurements
— dictionary of name: Measurement
.
2.5 Measurement¶
A Measurement
is a realization of a Metric
: always a scalar value.
2.5.1 Prior art¶
lsst.validate.base.MeasurementBase
2.5.2 Attributes¶
name
— Name of metric that this measured.value
— scalar Measurement value, required to be persisted in units ofMetric.unit
as anastropy.Quantity
.
2.5.3 Questions & Notes¶
lsst.validate.base.MeasurementBase
originally had a parameters attribute that provided Provenance for how the Measurement was made (e.g., a S/N cut-off for star selection). These will now be part of the Task configuration, and available through the regular Provenance.lsst.validate.base.MeasurementBase
also had an extras attribute where additional Measurement outputs could be persisted (JSON serializable). Do we still want this? Or do we always want such data to go into a Blob?
2.6 Blob¶
A blob
is a container for JSON-serializable data that may be associated with one or more Measurements that might be useful for rendering plots and doing SQUASH-side analysis.
2.6.1 Prior art¶
lsst.validate.base.BlobBase
2.7 Job¶
A Job
is an execution of a pipeline, containing Measurements
, blobs
, and their Provenance
.
2.7.1 Prior art¶
lsst.validate.base.Job
2.7.2 Attributes¶
Measurements
— list of Measurement objects (TODO: or a MeasurementSet?
).blobs
— list of Blob objects. Each blob should be reference by at least one * Measurement.Provenance
— data structure that fully specifies the Provenance of the pipeline run.
2.7.3 Questions & Notes¶
- What is the schema of
Provenance
? At minimum, it includes the input dataIds (input dataset) and task configurations. - Not all
Provenance
is currently known within the pipeline. We use post-qa to hydrate Job Provenance with package versions and Jenkins environment variables. However, working towards a state where post-qa is no longer used as a shim, it’s not unreasonable to move this into validate_base.
2.8 Provenance¶
All metadata associated with this Job
run, including Config parameters, Butler dataRefs, cluster configuration, etc.
2.8.1 Questions & Notes¶
- How is provenance defined?
- How do we define queries on provenance in a
Specification
? - How do we map between this provenance and the one that DAX will maintain?
2.9 Specification¶
A Specification
is a binary (pass/fail) evaluation of a Measurement
of a Metric
. There can be an arbitrary number of Specifications
associated with a Metric
.
2.9.1 Attributes¶
name
— Identifier of theMetric
that this Specification is attached to.provenance_query
— onlyMeasurements
that have matchingProvenance
parameters are tested by thisSpecification
.parameters
- A dict of key:value pairs that must be matched by theJob
‘sProvenance
regarding particular values used in a calculation (e.g. diameter used for aperture photometry).alert_listeners
- Slack IDs of people who are alerted if aMeasurement
fails theSpecification
.alert_channels
- Slack Channel IDs that recieve messages when aMeasurement
fails aSpecification
.threshold
and comparison_operator —Measurement
passesSpecification
ifMeasurement
is on the side of the threshold indicated by the comparison operator.range
—Measurement
passesSpecification
if Measurement is within this range (new).
2.9.2 Questions & Notes¶
- Either threshold or range can be set. Possibly there should be different classes of Specification (i.e., a ThresholdSpecification or a RangeSpecification).
- Note that we’re jettisoning some of the earlier
Specification
class baggage, like dependencies. This means that the definitions of Metrics are no longer driven by definitions of Specifications, as they currently are for AFx/ADx, for example. Instead, this flexibility is handled by additional Metrics. - Should the
Parameters
just be part of theProvenance
, or should they be a separate section for maintanence convinience and get ingested into theProvenance
?
2.10 MeasurementView¶
A MeasurementView is a collection of Measurements for a Metric, possibly filtered by Provenance. A MeasurementView can be used to populate a Measurement timeseries (regression plot), as seen in SQUASH. A MeasurementView is essentially a DB query, but provides a more concrete API for us to think about how we can do data science against Measurements.
2.10.1 Attributes¶
Metric_name
Provenance_query
3 validate_metrics: A package for metric and specification definitions¶
All packages that make metric measurements define those metrics as YAML files in the validate_metrics
package.
Likewise, all specifications for these metrics are also centrally defined in YAML files committed to validate_metrics
.
This design is appealing because SQUASH infrastructure can watch the validate_metrics
repository and populate its DB from validate_metrics
as a single source of truth.
validate_metrics
effectively becomes a user interface for package developers and test engineers to configure the testing system.
validate_metrics
is designed to be a data-only package (though it still provides a version in Python, lsst.validate.metrics.__version__
)
validate_base
provides Python access to metrics and specifications.
Within validate_metrics
, developers work in two directories:
/metrics
hosts metric definition YAML files. For each Stack package that generates metric measurements there is a metric definition file named after that package. For example:metrics/ validate_drp.yaml jointcal.yaml
The format of these YAML files is described below. We expect these metric definitions to be slow moving, and only change when a new metric measurement is coded into a Stack package.
/specs
hosts specification definition. These YAML files are organized into sub-directories named after the metric YAML file, but otherwise the names of specification YAML files has no programmatic meaning. For example:specs/ validate_drp/ LPM-17.yaml cfht_gri.yaml
In this example, official specifications defined in LPM-17, the Science Requirements Document, are coded in
LPM-17.yaml
. This specification file would remain static, while developers would typically add custom, ad-hoc, specifications in other files, likecfht_gri.yaml
. The format of specification YAML files is described below.
3.1 Metric YAML format¶
This is an example of a PA1 metric encoded in validate_metrics/metrics/validate_drp.yaml
:
PA1:
description: >
The maximum rms of the unresolved source magnitude distribution around the mean value.
unit: mmag
reference:
doc: LPM-17
url: http://ls.st/lpm-17
page: 21
The root level of a metric YAML file is an associative array (equivalent to a Python dict) where keys are metric names.
In the above example, only PA
is shown, but it might be followed by other metrics like PF1
and PA2
.
A metric definition itself is minimal, consisting of only three fields:
description
: a sentence, or even multiple paragraphs, that describe the metric. This description is consumed by the Science Pipelines documentation, and also shown by SQUASH.unit
: the string representation of theastropy.units.Unit
that measurements of a metric are made in. Unitless metrics (a count, for example), have units written as an empty string''
. Percentages can be written as%
. Fractions are not supported by astropy.unit so fractional metrics must be rephrased as percentages.reference
: this field points to further documentation where a metric is formally defined. Providedoc
,url
, andpage
fields as appropriate.
3.1.1 Fully qualified metric name¶
Metrics can be referenced universally by their fully qualified name:
{ package name }.{ metric name }
For example, the fully qualified name for the example metric is validate_drp.PA1
.
When working inside a package, where context is clear, the validate API can permit metrics to be addressed by name alone, PA1
.
3.2 Specification YAML format¶
A complete specification looks like:
---
metric: 'PA1'
name: 'design_gri'
threshold:
value: 5.0
unit: '%'
operator: '<='
provenance_query:
filter: ['g', 'r', 'I']
...
Notice that each specification is encapsulated within a corresponding YAML document (which are divided by ---
tokens).
There is always one specification per YAML document.
This architecture allows us to spread specifications across many YAML files in the validate_metrics
repository, and permit specifications to reference each other (see partials and inheritance, below).
The fields of a specification are:
metric
: the name of the metric this specification applies to. Since specifications are encapsulated by package, there is no need to use the fully-qualified metric name.name
: the name of this metric.Specifications extend the naming system of metrics. The fully qualified name of this specification is
validate_drp.PA1.design_gri
(assuming the specification is defined in/specs/validate_drp/
).threshold
: this is a test against a measurement. A measurement passes a specification test if this statement evaluates to true:{ measurement value } { operator } { threshold value }
Other test formats are available for specifications. See below.
provenance_query
field is an associative array (dictionary) of query terms for measurement provenance that this specification can be applied to. The query language is currently undefined, so the example is a pseudocode query where thefilter
must be one ofg
,r
ori
.
3.2.1 Specification tests¶
The binary comparison test is quite common, but its not the only imaginable test structure. Other types of tests that may be supported by the validation framework are:
tolerance
: consisting of a target value, and a symmetric tolerance window.window
: test if a measurement deviates from the sample of previous measurements in a given window, by a given amount.function
: specifies an importable Python function that computes a binary True (pass) or False (fails) result.
3.2.2 Specification partials¶
Specifications might repeat information.
For example, a provenance_query
for a certain test dataset.
We apply DRY design principles though partials.
A partial has an id
field, and can’t be a specification on its own.
For example:
---
# specification partial
id: 'base'
metric: 'PA1'
threshold:
unit: ''
operator: '<='
provenance_query:
filter: ['g', 'r', 'I']
---
# design specification instance that mixes in the base partial
# validate_drp.PA1.design
name: 'design'
base: '#base'
threshold:
value: 5.0
---
# stretch specification instance that mixes in the base partial
# validate_drp.PA1.stretch
name: 'stretch'
base: '#base'
threshold:
value: 3.0
...
A partial can be referenced from the base
field by prefixing the id
with #
.
Partials can also be referenced from across files (but within the same package’s specs
directory) by providing a filename:
base: "cfht_gri#base"
3.2.3 Specification inheritance¶
Specifications can also inherit from specifications; generally to add partials.
Specifications are referenced through their fully qualified name validate_drp.PA1.design_gri
, or the package-relative fully qualified name, PA1.design_gri
.
For example:
---
# Specification partial
id: 'PA1-base'
metric: 'PA1'
threshold:
unit: 'mmag'
operator: "<="
---
# validate_drp.PA1.minimum_gri
name: "minimum_gri"
base: "#PA1-base"
threshold:
value: 8.0
---
# Partial that queries a cfht_gri dataset
id: 'cfht-base'
provenance_query:
dataset_repo_url: 'https://github.com/lsst/validation_data_cfht.git'
filters: ['g', 'r', 'i']
visits: [849375, 850587]
ccd: [12, 13, 14, 21, 22, 23]
---
# validate_drp.PA1.cfht_minimum_gri
name: 'cfht_minimum_gri'
base: ['PA1.minimum_gri', '#cfht-base']
...
The fully-hydrated validate_drp.PA1.minimum_gri
specification is:
---
name: 'minimum_gri'
metric: 'PA1'
threshold:
value: 8.0
unit: 'mmag'
operator: "<="
And the fully-hydrated validate_drp.PA1.cfht_minimum_gri
specification is:
name: 'cfht_minimum_gri'
metric: 'PA1'
provenance_query:
dataset_repo_url: 'https://github.com/lsst/validation_data_cfht.git'
filters: ['g', 'r', 'i']
visits: [849375, 850587]
ccd: [12, 13, 14, 21, 22, 23]
threshold:
value: 8.0
unit: 'mmag'
operator: "<="
4 How Measurements are submitted to SQUASH¶
4.1 Design Principles¶
- Think about Airplane Mode.
- Think about how this will eventually be a Butler.put().
4.2 Proposal¶
Packages construct a Job
that contains Measurements
, blobs
and Provenance
. This Job
, serialized to JSON, is sent over the logger. A special Metric logger is used that saves this log statement to a separate file. A next-generation post-qa sends this Job to SQUASH’s REST API.
- Bonus: Packages could provide Jupyter Notebooks that locally consume the log data to show plots and pass/fail Specification status.
- Bonus: make validate_base capable of generating the Jupyter Notebook!
- Bonus: share Bokeh plots between notebooks and SQUASH.
5 How Specifications are registered¶
5.1 Design principles¶
Specifications
are a mechanism for LSST staff to monitor aMeasurementView
and be alerted whenever a newMeasurement
exceeds a threshold or range.- It needs to be easy for any LSST staff member to register a new
Specification
; they should not be required to contact SQuaRE to register or change aSpecification
. - Specifications should be available offline, but be synced to SQUASH.
5.2 Proposal¶
There is a common EUPS package that contains Specifications
in a YAML format. These Specifications are available, through a Python API, to packages so that they can show real-time pass/fail status of Measurements. The Specifications are also synchronized with the SQUASH database. If someone wants to be alerted by a Specification, they sign themselves up as an owner of the Specification.