N3C Data Enclave: the Data Enclave is a secure platform where clinical data for participating sites is stored. The Data Enclave’s technology partner is Palantir.
Public N3C PPRL Dashboard
About
Thank you for joining the Privacy Preserving Record Linkage (PPRL) hashing community for COVID data linkage. PPRL is a means of connecting records using secure, pseudonymization processes in a data set that refer to the same individual across different data sources while maintaining the individuals’ privacy. Linkage is defined here as any operation involving two or more datasets using de-identified cryptographic hashes (tokens) to match records associated with the same individuals anonymously, without ever using the individual true identifiers.
There are three main reasons why privacy preserving record linkage is key to this effort:
- PPRL enables de-identified deduplication of patients across institutions to account for care fragmentation.
- PPRL enables de-identified linking to multi-model data, such as image data from various health system PACS systems.
- PPRL enables de-identified cohort overlap discovery from other research studies. For example, we can understand the extent of overlap between the NIH All of Us cohort and the N3C cohort.
Regenstrief Institute is the partnered Linkage Honest Broker (LHB). Regenstrief Institute is a dynamic, people-centered research organization driven by a mission to connect and innovate for better health. All people deserve the best quality care. That is why Regenstrief Institute conducts research and development at the intersection of clinical medicine, technology, academia, and industry. Regenstrief is contracted by NCATS and is a neutral entity located outside of the N3C enclave that serves as an escrow for the de-identified tokens (“hashes”) and operates the technology platform which facilitates PPRL using these tokens. The LHB does NOT receive, store, or process PHI/PII. As aforementioned this is ONLY held by the data contributing sites. The LHB will hold certain metadata such as the originating contributor/data source, and the nature of data associated with the received tokens, e.g., EHR data, chest x-ray, viral variant data. Datavant is a partner of Regenstrief Institute who provides the software to perform the de-identified tokens (hashes).
The N3C Data Enclave is a secure platform through which the harmonized clinical data provided by our contributing members is stored. The data itself can only be accessed through a secure cloud portal hosted by NCATS and cannot be downloaded or removed.
In addition to sending data to the N3C Data Enclave, sites participating in the hashing community will prepare an additional set of files that will be submitted directly to the LHB service at Regenstrief Institute. These additional files include hashed identifiers (tokens), which correspond to a unique patient ID, as well as a Manifest file that includes metadata describing site-specific information.
Participating Entities
Sponsored by:
Supported by:
High Level Overview of Data Flow
LHB Onboarding Process
Prerequisite: Signed Linkage Honest Broker Agreement (LHBA)
Onboarding with the Linkage Honest Broker (LHB) can be completed in 3 easy steps:
1) Site Registration – Provision Firewall
2) Individual Registration – Create SFTP Account
3) Setup and Connect to LHB SFTP
- Complete the Site Registration Form. In the form, list the site personnel who require access to the LHB SFTP
- Formal site name (full name of your institution)
- Formal Site Abbreviation
- Principal investigator’s name
- Public Static IP or CIDR block
- List of names (first and last) and email addresses for users who should have access to the LHB SFTP
- Primary Technical Contact Name and email address for your site
- After the firewall has been set up and your IP address whitelisted, an e-mail noting completion will be sent.
- Concurrently, once the Site Registration Form is submitted, the users who require an LHB SFTP account will receive an e-mail from RILHB@regenstrief.org with a link to the Individual User Access Form.
- To complete the Individual User Access Form, you will need your public SSH key. Instructions to complete this are in the e-mail with the form link. You may also download the instructions in the SSH section below.
- Once your account has been set up, an e-mail will be sent with your username and instructions on connecting to the LHB SFTP.
Tokenization
The token generation process is a seamless process that occurs in two steps by using the Datavant tool. First tokens will be created from your data source; then encryption of the tokens for transmission to the LHB. For more information on using the Datavant Software visit https://datavant.com/n3c-info-center-access/.
Irreversible hashing: one-way cryptographic SHA-256 hash using Datavant Master Salt makes tokens irreversible
Site-specific encryption: AES-128 encryption makes tokens site-specific, protecting you from potential security breaches
Manifest
- Manifest file contains metadata about submission and should be created for each submission to LHB
- You can script the creation of the file – most fields will be the same across submissions
- Manifest file naming convention: SiteAbbreviation_N3C_Date_MANIFEST.csv (Example: ABC_N3C_20220923_MANIFEST.csv)
LHB Data Package
The data package submitted to the Linkage Honest Broker (LHB) consists of 2 files zipped together:
- File 1 – Transit tokens
- File 2- Manifest
File naming conventions:
Data Package: SiteAbbreviation_N3C_Date.zip
– Example: ABC_N3C_20220923.zip
Token file: SiteAbbreviation_N3C_Date_TOKENS.csv
– Example: ABC_N3C_20220923_TOKENS.csv
Manifest file: SiteAbbreviation_N3C_Date_MANIFEST.csv – Example: ABC_N3C_20220923_MANIFEST.csv
NOTE: File names should be in ALL CAPITAL LETTERS.
Information regarding the data package components and naming convention can be found in part 2 of the Site Engagement Packet
Issues with data package or submission? Check out the FAQ’s.
Transfer Data Package .zip File to the LHB SFTP
- Transfer the .zip file to the remote site from your local site
- Ensure all files are named according to the file naming conventions.
- Refer to the Resources and Documents section or Site Engagement Packet, Part 2 for more detailed instructions
Resources and Documents
Private and Public SSH Key Generation
SFTP Setup Instructions
FAQs & Troubleshooting
Do I have to use FileZilla as SFTP client?
No. The Linkage Honest Broker supports any SFTP setup.
What port do I use when connecting to LHB SFTP?
Port 2222
What is the host?
lhbsftp.regenstrief.org
Do I need a password?
No. You will use the private SSH key created when submitting for your SFTP account creation.
What is the logon type?
Key File
What is my username?
E-mail address used when submitting the Individual User Access Form (usually your institution email address)
I am having trouble connecting to the LHB SFTP. What do I do?
- Confirm IP address / Range – In firewall
– https://whatismyipaddress.com/ (to get ipv4 and ipv6) - Confirm using open SSH format
- Verify private key format is .pem vs. .ppk
– .pem is typically used by Linux/MAC
– .ppk is used by FileZilla client – Windows - Confirm username
– Username is the e-mail address registered with the Linkage Honest Broker
My file contains all error tokens.
- Confirm column headers in correct order.
- Verify submitting the correct file to the LHB.
If unable to resolve:
Please contact the Linkage Honest Broker by submitting a service desk ticket.