Roadmap, Governance & Access
Governance, access, andthe resource roadmap.
HARK uses open licensing for code and documentation, open clinical and audiologic standards for data, and tiered controlled access via BDSP. This page documents how the consortium plans to govern the resource and how investigators apply to use it.
The architecture
Archival custody, separate from active compute.
HARK pairs dbGaP for long-term archival custody with BDSP for active, versioned compute. Each does what it does best.
HARK is planned to occupy two complementary layers. The harmonized release is deposited in dbGaP as the archival record, where Dr. Dubno's MUSC longitudinal hearing data already resides; this permits harmonization and linkage without relocating that resource. The active analytic layer is hosted on the Brain Data Science Platform (BDSP), an NIH-approved data-sharing repository supported by the AWS Open Data Sponsorship Program for long-term, no-cost storage. BDSP provides Python and analytic tooling, GPU access, FAIR-aligned BIDS-style organization, tiered access (controlled, registered, open), and a standardized data-use-agreement framework. The platform was previously used to host the Harvard Electroencephalography Database (HEEDB), which contains more than 100,000 patients across four Boston hospitals.
This separation of archival custody from active compute follows current NIH data-platform practice rather than introducing a new siloed repository.
Access
Tiered controlled access via BDSP.
The consortium plans to use BDSP's standardized data-use-agreement framework, with tiered access (open summary statistics, registered de-identified record-level, and controlled linked clinical detail) reviewed by a HARK Data Access Committee. Approved investigators will work against the same versioned release the consortium uses for its own studies, inside the BDSP enclave. Detailed procedures, including the DUA template, IRB expectations, and review cadence, are being finalized with the contributing sites and BDSP and will be posted as they are ratified.
Governance
A four-committee structure for HARK governance.
The proposed governance comprises four committees with distinct scopes: a Steering Committee for strategic direction; a Data Access Committee for review of data-use applications; an Ontology and Standards Working Group for the data model and crosswalks; and an External Advisory Committee for independent review. Membership, meeting cadence, and conflict-of-interest and conflict-resolution procedures follow the consortium MPI Plan. The HearShare Consortium has met on the first Thursday of each month since March 2022, and that cadence continues.
Standards & code
Open licensing and open standards.
The HARK data model, ontology crosswalks, ETL templates, and reference notebooks will be released under Apache 2.0 for code and CC-BY 4.0 for the data dictionary and documentation. Audiologic and clinical concepts are aligned to LOINC, SNOMED CT, ICD-10-CM, FHIR R4, and OMOP; file organization follows BIDS conventions. Each HARK release is versioned and citable, with the long-term archival record deposited to dbGaP, where Dr. Dubno's MUSC longitudinal hearing data already resides.
Phases
A staged work plan with long-term continuity.
Year 1 focuses on the data model, ontology crosswalks, governance setup, and site-level pilot extracts. Year 2 takes the harmonization to consortium scale, with the first dbGaP submission package and early API surfaces. Year 3 validates models and analyses across sites and prepares the longer-term transition. Beyond the initial three years, the dbGaP archive persists indefinitely, BDSP hosting continues under the AWS Open Data Sponsorship Program, and governance is planned to anchor with a professional-society partner.
Sustainability
Three mechanisms for long-term durability.
Get involved