Hierarchical Metadata: The Census Case Study#
This example demonstrates how the FAIR Data JSON Schema can describe complex, multi-level data products (like a National Census) while maintaining semantic clarity through its three-level organization: Universal, Dataset, and Property scopes.
See the companion schema files:
../../../examples/complex-data-product.jsonand../../../examples/complex-data-product.data.json
How-to: Describe a Complex Data Product
Root Metadata: Add
fair:license,fair:provider, andfair:provenanceat the schema root.Define Tables: Use
propertiesto represent different tables or files within the product.Table Metadata: Add table-specific metadata like
fair:populationorfair:temporalCoverage.Variable Reuse: Use
$refto pull in shared variable definitions from a centraldefinitions.jsonor regional registry.Validation: Ensure the schema validates both the metadata structure and the actual data structure.
The Story: One Product, Many Contexts#
A National Census is not just a single list of numbers. It is a Data Product that contains multiple related entities (Households and Persons) nested within a single hierarchical structure.
In this example, we have a single JSON file that contains a households array. Each household record contains its own properties (ID, Region) and a nested persons array representing the residents of that household.
1. Universal Scope: The Semantic Identity#
At every level of the hierarchy, we use Universal keywords to identify what we are looking at. These keywords are “Global” because they have the same meaning regardless of whether they describe a whole product or a single field.
Role Identification: We use
fair:resourceTypeto explicitly mark the root as adata-productand the nested arrays asdataset.Implicit Defaults: Note that the leaf fields (age, sex, etc.) do not need an explicit
fair:resourceType, as they are implicitly treated asvariableby the dialect.
{
"title": "National Census 2024 (Data Product)",
"fair:resourceType": "data-product", // <--- Explicit role
"properties": {
"households": {
"title": "Household Table",
"fair:resourceType": "dataset", // <--- Explicit role (nested)
"items": {
"properties": {
"age": {
"title": "Age", // <--- Implicitly a "variable"
"type": "integer"
}
}
}
}
}
}
2. Dataset Scope: Provenance & Coverage#
The Dataset scope keywords describe the “Container.” What makes this example complex is that we apply these keywords at different levels of the hierarchy:
At the Root (The Product): We define the
fair:provider(National Statistical Office), thefair:license(CC-BY-4.0), and thefair:temporalCoverage(Year 2024). This applies to everything inside the file.At the Table (The Resource): Each nested array can have its own
fair:populationandfair:unitType.
"households": {
"fair:resourceType": "dataset",
"fair:population": "All private households", // <--- Table-level metadata
"fair:unitType": "Household",
"type": "array"
}
3. Property Scope: Representation & Semantics#
Finally, at the “leaves” of the data tree, we use Property scope keywords to define how the data is stored and what it maps to in the real world.
Category Mapping: The
sexvariable uses the Hybrid Pattern, anchoring technical codes (1,2) to global semantic URIs (fair:conceptRef) and an authority (fair:classification).Measurement Units: The
ageproperty is explicitly mapped to theQUDTunit for “Years” usingfair:unitRef.
"sex": {
"type": "integer",
"fair:classification": "SDMX Sex", // <--- Authority Name
"fair:classificationRef": ["https://example.org/vocabs/sex"], // <--- Authority Link
"oneOf": [
{ "const": 1, "title": "Male", "fair:conceptRef": "https://www.wikidata.org/wiki/Q6581097" },
{ "const": 2, "title": "Female", "fair:conceptRef": "https://www.wikidata.org/wiki/Q6581072" }
]
}
Why This Matters for FAIR Data#
By using this 3-level organization in a hierarchical schema:
Findability: A search engine can find the dataset by searching for the provider (Root level) or the specific unit type (Table level).
Interoperability: A data integration tool can automatically join this “Household” table with a “Housing Quality” dataset from a different provider because they both share the same NUTS
fair:classificationfor regions.Reusability: Because the license and temporal coverage are explicitly attached at the root, a machine can automatically determine if it is legally allowed to aggregate this data with other sources.
Summary of Scopes in this Example#
Level |
Scope |
Key Metadata |
|---|---|---|
Root |
Dataset |
License, Provider, Year |
Households |
Dataset |
Population: Private Households, Unit: Household |
Region |
Property |
Authority: NUTS, Concept: Eurostat Regions |
Persons |
Dataset |
Unit: Person, Population: Residents |
Sex |
Property |
Authority: SDMX, Concept: Wikidata Sex |
Age |
Property |
Unit: Years (QUDT) |
Full Schema Implementation#
{
"$schema": "https://highvaluedata.net/fair-data-schema/dev",
"$id": "https://highvaluedata.net/fair-data-schema/dev/examples/complex-data-product",
"title": "National Census 2024 (Data Product)",
"description": "A hierarchical data product containing related household and person tables.",
"fair:resourceType": "data-product",
"fair:description": "This dataset represents a simulated 2024 National Census. It demonstrates the use of Universal, Dataset, and Property scopes in a complex, nested format.",
"fair:provider": "National Statistical Office",
"fair:providerRef": "https://example.org/nso",
"fair:license": "CC-BY-4.0",
"fair:licenseRef": "https://spdx.org/licenses/CC-BY-4.0",
"fair:temporalCoverage": {
"description": "Census Year 2024",
"start": "2024-01-01",
"end": "2024-12-31"
},
"type": "object",
"required": ["households"],
"properties": {
"households": {
"title": "Household Table",
"description": "A list of households in the census.",
"fair:resourceType": "dataset",
"fair:population": "All private households",
"fair:unitType": "Household",
"type": "array",
"items": {
"type": "object",
"required": ["household_id", "region"],
"properties": {
"household_id": {
"title": "Household Identifier",
"type": "string",
"fair:conceptRef": "https://example.org/concepts/household-id"
},
"region": {
"title": "Geographic Region",
"type": "string",
"fair:classification": "Eurostat NUTS",
"fair:classificationRef": ["http://data.europa.eu/nuts"],
"fair:spatialCoverageRef": "http://data.europa.eu/nuts/code/BE",
"oneOf": [
{ "const": "BE1", "title": "Brussels" },
{ "const": "BE2", "title": "Flanders" },
{ "const": "BE3", "title": "Wallonia" }
]
},
"persons": {
"title": "Person Table (Nested)",
"description": "A list of persons living in this household.",
"fair:resourceType": "dataset",
"fair:population": "All residents in the household",
"fair:unitType": "Person",
"type": "array",
"items": {
"type": "object",
"required": ["person_id", "age", "sex"],
"properties": {
"person_id": {
"title": "Person Identifier",
"type": "string",
"fair:conceptRef": "https://example.org/concepts/person-id"
},
"age": {
"title": "Age",
"type": "integer",
"minimum": 0,
"fair:unit": "years",
"fair:unitRef": "https://example.org/vocabs/units/year"
},
"sex": {
"title": "Biological Sex",
"type": "integer",
"fair:classification": "SDMX Sex",
"fair:classificationRef": ["https://example.org/vocabs/sex"],
"oneOf": [
{ "const": 1, "title": "Male", "fair:conceptRef": "https://www.wikidata.org/wiki/Q6581097" },
{ "const": 2, "title": "Female", "fair:conceptRef": "https://www.wikidata.org/wiki/Q6581072" }
]
}
}
}
}
}
}
}
}
}
Example Data Instance#
{
"households": [
{
"household_id": "H-2024-001",
"region": "BE1",
"persons": [
{
"person_id": "P-2024-001-A",
"age": 42,
"sex": 1
},
{
"person_id": "P-2024-001-B",
"age": 38,
"sex": 2
}
]
},
{
"household_id": "H-2024-002",
"region": "BE2",
"persons": [
{
"person_id": "P-2024-002-A",
"age": 65,
"sex": 2
}
]
}
]
}