The Variable Cascade Master Guide#
This guide explains how to link local JSON data implementations to global metadata standards (DDI, MLCommons Croissant, Schema.org) using the Variable Cascade pattern.
See the companion schema file: ../../../examples/variable-cascade.json
How-to: Implement the Variable Cascade
Define Conceptual Variables: In
$defs, create abstract definitions (e.g.,Sex,Age).Define Represented Variables: Create specific representations (e.g.,
Sex_ISO,Age_Years).Reference Internal: Use
$refwithin$defsto chain these definitions.Instantiate: In your
properties, reference the final representation using$ref.Override (Optional): Add local
titleordescriptionto the instance variable while keeping the core definition linked.
1. Rationale: The “Single Entry Point” Principle#
In robust metadata systems, variables form a hierarchy—from high-level phenonmena (Conceptual) down to specific survey questions (Instance).
To avoid redundancy and semantic “noise,” a JSON property should only point to its direct parent in that hierarchy. Once a link is established (the “Entry Point”), specialized tools can follow the URI to discover the full lineage on the authoritative registry.
Visualizing the Hierarchy: Employment Status#
A typical cascade allows a researcher to trace a data point from a specific survey question back to a global concept:
Conceptual Variable: Measures Employment Status for a Person (Unit Type).
Represented Variable: Defines the measurement as a Binary (Active/Inactive) coding scheme for Adult residents (Universe).
Instance Variable: Represents the specific column in the 2024 Labor Survey for Residents of Iceland (Population).
By only pointing to the Instance Variable, the property inherits the entire lineage above it.
2. Industry Standard Mappings#
Different specifications use different naming conventions, but they all fit into the FAIR Variable Cascade.
Standard |
Object |
Cascade Level |
Keyword Mapping |
|---|---|---|---|
DDI |
|
Instance |
|
DDI |
|
Represented |
|
MLCommons |
|
Instance |
|
Schema.org |
|
Instance |
|
Industry Comparison & Code Snippets#
Since Croissant and Schema.org typically define variables in the context of a specific dataset, they are mapped using the fair:instanceVariableRef keyword.
1. MLCommons Croissant (Field)#
In Croissant, a Field describes a column in a RecordSet. This is a direct implementation of an Instance Variable.
"satisfaction": {
"type": "integer",
"fair:instanceVariableRef": "https://croissant-registry.org/datasets/v1/fields/satisfaction",
"fair:label": "Overall life satisfaction"
}
2. Schema.org (StatisticalVariable)#
A Schema.org StatisticalVariable represents a specific measurement (e.g., “Population Count”) linked to a Place and Time. It acts as the population-bound implementation.
"pop_count": {
"type": "integer",
"fair:instanceVariableRef": "https://schema-registry.org/statvars/PopulationCount",
"fair:universeRef": "https://schema-registry.org/places/World"
}
3. The Binding Chain: Unit Type, Universe, & Population#
The cascade is also where we define the scope of the study. Each level of the variable cascade binds the measurement to a more specific group.
Unit Type (Conceptual Variable): The observation unit.
Example: Person. (Keyword:
fair:unitType)
Universe (Represented Variable): The broad group being studied globally.
Example: Students. (Keyword:
fair:universe)
Population (Instance Variable): The specific group bound by time and space.
Example: Students in School District A in 2019. (Keyword:
fair:population)
[!IMPORTANT] Observation Unit vs. Measurement Unit:
fair:unitTypeidentifies the subject (e.g., “Person”), whilefair:unitidentifies the scale (e.g., “Kilograms”).
4. Building Internal Cascades (The Chained Pattern)#
You can build a full variable cascade entirely within one JSON Schema by chaining references through the $defs section.
Property points to
#/$defs/REPRESENTED_VARviafair:representedVariableRef.REPRESENTED_VARpoints to#/$defs/CONCEPT_VARviafair:conceptualVariableRef.CONCEPT_VARgrounds the chain in a global semantic (e.g., a Wikidata URI viafair:conceptRef).
{
"$defs": {
"CONCEPT_AGE": {
"fair:conceptRef": "https://www.wikidata.org/wiki/Q185836",
"fair:unitType": "Person"
},
"REPRESENTED_AGE_5YR": {
"fair:conceptualVariableRef": "#/$defs/CONCEPT_AGE",
"fair:universe": "Adult citizens"
}
},
"properties": {
"respondent_age": {
"type": "integer",
"fair:representedVariableRef": "#/$defs/REPRESENTED_AGE_5YR",
"fair:population": "Active voters in 2024"
}
}
}
This allows for deep, professional lineage without needing an external registry.
5. Summary of Rules#
Exclusivity: Only one technical authority reference (
instance,represented, orconceptual) is allowed per property.Inheritance: A property inherits the
fair:universeorfair:populationof the dataset root unless it provides a specific local override for that variable.Flatness: All annotations are flat; no complex nested objects are used.
Full Schema Implementation#
{
"$schema": "https://highvaluedata.net/fair-data-schema/dev",
"$id": "https://highvaluedata.net/fair-data-schema/dev/examples/variable-cascade",
"title": "Comprehensive Variable Cascade Example",
"description": "A 'Master Example' showcasing all aspects of the Variable Cascade: internal chaining, external authority mapping (DDI, Croissant, Schema.org), and the hierarchy of Unit Type, Universe, and Population.",
"type": "object",
"fair:universe": "General population of Iceland",
"fair:universeRef": "https://wikidata.org/wiki/Q189",
"fair:temporalCoverage": { "start": "2024-01-01", "end": "2024-12-31" },
"fair:spatialCoverage": "Iceland",
"$defs": {
"CONCEPT_EMPLOYMENT": {
"title": "Conceptual: Occupational Status",
"description": "The phenomenon of economic activity.",
"fair:conceptRef": "https://www.wikidata.org/wiki/Q156371",
"fair:unitType": "Person",
"fair:unitTypeRef": "https://www.wikidata.org/wiki/Q5"
},
"REPRESENTED_EMPLOYMENT_BINARY": {
"title": "Represented: Active/Inactive Binary",
"description": "A shared coding scheme for employment status.",
"fair:conceptualVariableRef": "#/$defs/CONCEPT_EMPLOYMENT",
"fair:universe": "Adult human beings of working age"
}
},
"properties": {
"q1_age": {
"title": "Age (External DDI Instance)",
"description": "Full formal mapping where a dataset-specific variable points to an external registry.",
"type": "integer",
"fair:instanceVariableRef": "https://agency.is/vars/q10_a",
"fair:conceptRef": "https://www.wikidata.org/wiki/Q185836",
"$comment": "Tools find 'Person' (Unit Type) and 'General Pop' (Universe) by traversing the external URI."
},
"q2_employment": {
"title": "Employment (Internal Pure Cascade)",
"description": "Lineage built entirely within the schema using internal references.",
"type": "integer",
"fair:representedVariableRef": "#/$defs/REPRESENTED_EMPLOYMENT_BINARY",
"fair:population": "Adult residents of Iceland",
"fair:populationRef": "https://agency.is/vocabs/pop/Iceland-Adult-2024"
},
"q3_satisfaction": {
"title": "Satisfaction (MLCommons Croissant)",
"description": "Mapping to a Croissant 'Field' definition (treated as an Instance level definition).",
"type": "integer",
"fair:instanceVariableRef": "https://croissant-registry.org/datasets/feedback/fields/satisfaction",
"fair:label": "Overall life satisfaction"
},
"q4_pop_count": {
"title": "Population (Schema.org StatVar)",
"description": "Mapping to a Schema.org 'StatisticalVariable'.",
"type": "integer",
"fair:instanceVariableRef": "https://schema-registry.org/statvars/PopulationCount",
"fair:universeRef": "https://schema-registry.org/places/World"
},
"q5_voter_status": {
"title": "Voter Status (Standalone Annotation)",
"description": "No variable record; just flat annotations for population and phenomenon.",
"type": "boolean",
"fair:universe": "Registered voters in Iceland",
"fair:conceptRef": "https://www.wikidata.org/wiki/Q110609353"
}
}
}