Hierarchical Metadata: The Census Case Study#

This example demonstrates how the FAIR Data JSON Schema can describe complex, multi-level data products (like a National Census) while maintaining semantic clarity through its three-level organization: Universal, Dataset, and Property scopes.

How-to: Describe a Complex Data Product

  1. Root Metadata: Add fair:license, fair:provider, and fair:provenance at the schema root.

  2. Define Tables: Use properties to represent different tables or files within the product.

  3. Table Metadata: Add table-specific metadata like fair:population or fair:temporalCoverage.

  4. Variable Reuse: Use $ref to pull in shared variable definitions from a central definitions.json or regional registry.

  5. Validation: Ensure the schema validates both the metadata structure and the actual data structure.


The Story: One Product, Many Contexts#

A National Census is not just a single list of numbers. It is a Data Product that contains multiple related entities (Households and Persons) nested within a single hierarchical structure.

In this example, we have a single JSON file that contains a households array. Each household record contains its own properties (ID, Region) and a nested persons array representing the residents of that household.

1. Universal Scope: The Semantic Identity#

At every level of the hierarchy, we use Universal keywords to identify what we are looking at. These keywords are “Global” because they have the same meaning regardless of whether they describe a whole product or a single field.

  • Role Identification: We use fair:resourceType to explicitly mark the root as a data-product and the nested arrays as dataset.

  • Implicit Defaults: Note that the leaf fields (age, sex, etc.) do not need an explicit fair:resourceType, as they are implicitly treated as variable by the dialect.

{
  "title": "National Census 2024 (Data Product)",
  "fair:resourceType": "data-product",  // <--- Explicit role
  "properties": {
    "households": {
      "title": "Household Table",
      "fair:resourceType": "dataset",    // <--- Explicit role (nested)
      "items": {
        "properties": {
          "age": {
             "title": "Age",             // <--- Implicitly a "variable"
             "type": "integer"
          }
        }
      }
    }
  }
}

2. Dataset Scope: Provenance & Coverage#

The Dataset scope keywords describe the “Container.” What makes this example complex is that we apply these keywords at different levels of the hierarchy:

  • At the Root (The Product): We define the fair:provider (National Statistical Office), the fair:license (CC-BY-4.0), and the fair:temporalCoverage (Year 2024). This applies to everything inside the file.

  • At the Table (The Resource): Each nested array can have its own fair:population and fair:unitType.

"households": {
  "fair:resourceType": "dataset",
  "fair:population": "All private households", // <--- Table-level metadata
  "fair:unitType": "Household",
  "type": "array"
}

3. Property Scope: Representation & Semantics#

Finally, at the “leaves” of the data tree, we use Property scope keywords to define how the data is stored and what it maps to in the real world.

  • Category Mapping: The sex variable uses the Hybrid Pattern, anchoring technical codes (1, 2) to global semantic URIs (fair:conceptRef) and an authority (fair:classification).

  • Measurement Units: The age property is explicitly mapped to the QUDT unit for “Years” using fair:unitRef.

"sex": {
  "type": "integer",
  "fair:classification": "SDMX Sex",            // <--- Authority Name
  "fair:classificationRef": ["https://example.org/vocabs/sex"], // <--- Authority Link
  "oneOf": [
    { "const": 1, "title": "Male", "fair:conceptRef": "https://www.wikidata.org/wiki/Q6581097" },
    { "const": 2, "title": "Female", "fair:conceptRef": "https://www.wikidata.org/wiki/Q6581072" }
  ]
}

Why This Matters for FAIR Data#

By using this 3-level organization in a hierarchical schema:

  1. Findability: A search engine can find the dataset by searching for the provider (Root level) or the specific unit type (Table level).

  2. Interoperability: A data integration tool can automatically join this “Household” table with a “Housing Quality” dataset from a different provider because they both share the same NUTS fair:classification for regions.

  3. Reusability: Because the license and temporal coverage are explicitly attached at the root, a machine can automatically determine if it is legally allowed to aggregate this data with other sources.


Summary of Scopes in this Example#

Level

Scope

Key Metadata

Root

Dataset

License, Provider, Year

Households

Dataset

Population: Private Households, Unit: Household

Region

Property

Authority: NUTS, Concept: Eurostat Regions

Persons

Dataset

Unit: Person, Population: Residents

Sex

Property

Authority: SDMX, Concept: Wikidata Sex

Age

Property

Unit: Years (QUDT)


Full Schema Implementation#

The Data Product Schema#
{
    "$schema": "https://highvaluedata.net/fair-data-schema/dev",
    "$id": "https://highvaluedata.net/fair-data-schema/dev/examples/complex-data-product",
    "title": "National Census 2024 (Data Product)",
    "description": "A hierarchical data product containing related household and person tables.",
    "fair:resourceType": "data-product",
    "fair:description": "This dataset represents a simulated 2024 National Census. It demonstrates the use of Universal, Dataset, and Property scopes in a complex, nested format.",
    "fair:provider": "National Statistical Office",
    "fair:providerRef": "https://example.org/nso",
    "fair:license": "CC-BY-4.0",
    "fair:licenseRef": "https://spdx.org/licenses/CC-BY-4.0",
    "fair:temporalCoverage": {
        "description": "Census Year 2024",
        "start": "2024-01-01",
        "end": "2024-12-31"
    },
    "type": "object",
    "required": ["households"],
    "properties": {
        "households": {
            "title": "Household Table",
            "description": "A list of households in the census.",
            "fair:resourceType": "dataset",
            "fair:population": "All private households",
            "fair:unitType": "Household",
            "type": "array",
            "items": {
                "type": "object",
                "required": ["household_id", "region"],
                "properties": {
                    "household_id": {
                        "title": "Household Identifier",
                        "type": "string",
                        "fair:conceptRef": "https://example.org/concepts/household-id"
                    },
                    "region": {
                        "title": "Geographic Region",
                        "type": "string",
                        "fair:classification": "Eurostat NUTS",
                        "fair:classificationRef": ["http://data.europa.eu/nuts"],
                        "fair:spatialCoverageRef": "http://data.europa.eu/nuts/code/BE",
                        "oneOf": [
                            { "const": "BE1", "title": "Brussels" },
                            { "const": "BE2", "title": "Flanders" },
                            { "const": "BE3", "title": "Wallonia" }
                        ]
                    },
                    "persons": {
                        "title": "Person Table (Nested)",
                        "description": "A list of persons living in this household.",
                        "fair:resourceType": "dataset",
                        "fair:population": "All residents in the household",
                        "fair:unitType": "Person",
                        "type": "array",
                        "items": {
                            "type": "object",
                            "required": ["person_id", "age", "sex"],
                            "properties": {
                                "person_id": {
                                    "title": "Person Identifier",
                                    "type": "string",
                                    "fair:conceptRef": "https://example.org/concepts/person-id"
                                },
                                "age": {
                                    "title": "Age",
                                    "type": "integer",
                                    "minimum": 0,
                                    "fair:unit": "years",
                                    "fair:unitRef": "https://example.org/vocabs/units/year"
                                },
                                "sex": {
                                    "title": "Biological Sex",
                                    "type": "integer",
                                    "fair:classification": "SDMX Sex",
                                    "fair:classificationRef": ["https://example.org/vocabs/sex"],
                                    "oneOf": [
                                        { "const": 1, "title": "Male", "fair:conceptRef": "https://www.wikidata.org/wiki/Q6581097" },
                                        { "const": 2, "title": "Female", "fair:conceptRef": "https://www.wikidata.org/wiki/Q6581072" }
                                    ]
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Example Data Instance#

Valid data instance for this product#
{
    "households": [
        {
            "household_id": "H-2024-001",
            "region": "BE1",
            "persons": [
                {
                    "person_id": "P-2024-001-A",
                    "age": 42,
                    "sex": 1
                },
                {
                    "person_id": "P-2024-001-B",
                    "age": 38,
                    "sex": 2
                }
            ]
        },
        {
            "household_id": "H-2024-002",
            "region": "BE2",
            "persons": [
                {
                    "person_id": "P-2024-002-A",
                    "age": 65,
                    "sex": 2
                }
            ]
        }
    ]
}