Roadmap, Governance & Access

Governance, access, andthe resource roadmap.

HARK uses open licensing for code and documentation, open clinical and audiologic standards for data, and tiered controlled access via BDSP. This page documents how the consortium plans to govern the resource and how investigators apply to use it.

Code licenseApache 2.0
StandardsFAIR + BIDS-aligned
AccessTiered DUA via BDSP

The architecture

Archival custody, separate from active compute.

HARK pairs dbGaP for long-term archival custody with BDSP for active, versioned compute. Each does what it does best.

BDSP × dbGaP
Complementary, by design

HARK is planned to occupy two complementary layers. The harmonized release is deposited in dbGaP as the archival record, where Dr. Dubno's MUSC longitudinal hearing data already resides; this permits harmonization and linkage without relocating that resource. The active analytic layer is hosted on the Brain Data Science Platform (BDSP), an NIH-approved data-sharing repository supported by the AWS Open Data Sponsorship Program for long-term, no-cost storage. BDSP provides Python and analytic tooling, GPU access, FAIR-aligned BIDS-style organization, tiered access (controlled, registered, open), and a standardized data-use-agreement framework. The platform was previously used to host the Harvard Electroencephalography Database (HEEDB), which contains more than 100,000 patients across four Boston hospitals.

This separation of archival custody from active compute follows current NIH data-platform practice rather than introducing a new siloed repository.

Access

Tiered controlled access via BDSP.

The consortium plans to use BDSP's standardized data-use-agreement framework, with tiered access (open summary statistics, registered de-identified record-level, and controlled linked clinical detail) reviewed by a HARK Data Access Committee. Approved investigators will work against the same versioned release the consortium uses for its own studies, inside the BDSP enclave. Detailed procedures, including the DUA template, IRB expectations, and review cadence, are being finalized with the contributing sites and BDSP and will be posted as they are ratified.

Governance

A four-committee structure for HARK governance.

The proposed governance comprises four committees with distinct scopes: a Steering Committee for strategic direction; a Data Access Committee for review of data-use applications; an Ontology and Standards Working Group for the data model and crosswalks; and an External Advisory Committee for independent review. Membership, meeting cadence, and conflict-of-interest and conflict-resolution procedures follow the consortium MPI Plan. The HearShare Consortium has met on the first Thursday of each month since March 2022, and that cadence continues.

Standards & code

Open licensing and open standards.

The HARK data model, ontology crosswalks, ETL templates, and reference notebooks will be released under Apache 2.0 for code and CC-BY 4.0 for the data dictionary and documentation. Audiologic and clinical concepts are aligned to LOINC, SNOMED CT, ICD-10-CM, FHIR R4, and OMOP; file organization follows BIDS conventions. Each HARK release is versioned and citable, with the long-term archival record deposited to dbGaP, where Dr. Dubno's MUSC longitudinal hearing data already resides.

Phases

A staged work plan with long-term continuity.

Year 1 focuses on the data model, ontology crosswalks, governance setup, and site-level pilot extracts. Year 2 takes the harmonization to consortium scale, with the first dbGaP submission package and early API surfaces. Year 3 validates models and analyses across sites and prepares the longer-term transition. Beyond the initial three years, the dbGaP archive persists indefinitely, BDSP hosting continues under the AWS Open Data Sponsorship Program, and governance is planned to anchor with a professional-society partner.

Sustainability

Three mechanisms for long-term durability.

Archive durability
The HARK release lives in dbGaP, NIH's long-standing controlled-access genotype-and-phenotype archive, designed for long-term durability.
Hosting durability
BDSP is hosted via the AWS Open Data Sponsorship Program: no-cost, indefinite. BDSP itself is supported by a portfolio of NIH funding.
Governance durability
The Steering Committee transitions to a professional-society home, with ASHA as a strategic partner. Pattern follows established public clinical-data resources.

Get involved

Three ways to participate.

Investigator interest
Investigators interested in using HARK once procedures are finalized are welcome to reach out and be added to the early-access mailing list.
Contribute data
Institutions interested in contributing audiometric or related clinical data to a future HARK release are welcome to reach out for a scoping conversation.
Advisory and governance
The External Advisory Committee and Ontology and Standards Working Group plan to seat additional members during Year 1.
NIH-approved repository
FAIR-aligned
AWS Open Data
Apache 2.0