Dos and Donts
How to Provide a Dataset Before Submission to Avoid Unnecessary Rejections?
Section titled “How to Provide a Dataset Before Submission to Avoid Unnecessary Rejections?”Purpose of This Guide
Many dataset rejections are not caused by bad science, but by missing ISA structure , Incomplete metadata, Broken ontology references, Formatting inconsistencies and Incorrect linkage between the ISA (Investigation-Study -Assay files)
1. Recommended Workflow (Before Submission)
Section titled “1. Recommended Workflow (Before Submission)”Follow this checklist in order.
Step 1 — Create Proper ISA Structure
Section titled “Step 1 — Create Proper ISA Structure”Your dataset must follow the ISA model:
Investigation ├── Study ├── Assay ├── workflows (if applicable) └── runs (if applicable)Minimum required: investigation.xlsx , isa.study.xlsx and isa.assay.xlsx. Common reason for rejection is uploading only the excel files without ISA structure.
Step 2 — Define Study Naming Convention
Section titled “Step 2 — Define Study Naming Convention”Study names must follow a defined convention. For example:
genebankpartner_yearstart-yearend_sitenameCorrect:
NPPC_1971-1974_PiestanyCREA_2006_SALIncorrect:
trial1experiment_finaldata_new_versionIf the site name is missing, use coordinates temporarily:
IPGR_1985_lon-24.93388-lat-42.12002Step 3 — Ensure All Identifiers Are Stable
Section titled “Step 3 — Ensure All Identifiers Are Stable”Identifiers are the backbone of the dataset. Every sample must have sample name(mandatory in study file) , Accession ID and replicate IDs if replicated.
| Sample Name | Accession | Replicate |
|---|---|---|
| plant_1 | ACC1234 | 1 |
| plant_2 | ACC1234 | 2 |
Step 4 — Complete the Investigation File
Section titled “Step 4 — Complete the Investigation File”Most common rejection problems happen here. Ensure that Study referenced, Assay referenced, Ontologies listed, Contacts are complete, Description filled and All protocols described
2. Structure of the Dataset (What Must Exist)
Section titled “2. Structure of the Dataset (What Must Exist)”A. Investigation File
Section titled “A. Investigation File”Must Include
1. Investigation Metadata
Section titled “1. Investigation Metadata”- Title
- Description
- Submission date
- Project name
2. Investigation Contacts (CRITICAL)
Section titled “2. Investigation Contacts (CRITICAL)”-
First Name
-
Last Name
-
Email
-
ORCID
-
Affiliation (complete, not abbreviated)
First Name Last Name ORCID Email Affiliation Jagadeeshwar Etukala 0000-0002-XXXX jay@ipk.de Leibniz Institute of Plant Genetics and Crop Plant Research, Germany
Wrong:
Missing ORCIDOnly "IPK" as affiliation3. Ontologies Used (VERY IMPORTANT)
Section titled “3. Ontologies Used (VERY IMPORTANT)”Elena´s comment: Not all ontologies used in study and assays are listed in investigation.
You must list ALL ontologies used.
| Term Source Name | Term Source File | Term Source Version |
|---|---|---|
| CO | https://cropontology.org | 2024 |
| UO | http://purl.obolibrary.org/obo/uo.owl | 2023 |
| PO | http://purl.obolibrary.org/obo/po.owl | 2023 |
B. Study File (isa.study.xlsx)
Section titled “B. Study File (isa.study.xlsx)”Minimum Required Columns:
- Source Name
- Sample Name
- Organism
- Characteristics
- Geographic co-ordinate (latitude)
- Geographic co-ordinate (longitude)
Proper Coordinate Format
Section titled “Proper Coordinate Format”Correct format:
Latitude: 52.210000Longitude: 20.640000Incorrect format
52°12’N20.64EGermany (approx.)Basic Passport Data (Domain Specific)
Section titled “Basic Passport Data (Domain Specific)”Must include Accession number, Material source information and country or origin and source coordinates. Without passport data the data set is incomplete and not reusable.
C. Assay File (isa.assay.xlsx)
Section titled “C. Assay File (isa.assay.xlsx)”Contains:
- Sample Name (must match Study file exactly)
- Trait
- Value
- Unit
- Protocol reference
3.Common Formatting Issues (With Examples)
Section titled “3.Common Formatting Issues (With Examples)”1.Term Accession Number & Term Source REF Missing
Section titled “1.Term Accession Number & Term Source REF Missing”Elena´s comment: Usually Term Accession number and term source REF are left empty.
| Trait | Term Source REF | Term Accession Number |
|---|---|---|
| Days to heading | CO | CO_321:0000025 |
2. Confusing Trait Ontology vs Unit Ontology
Section titled “2. Confusing Trait Ontology vs Unit Ontology”Common mistake:
Trait = Plant height
Unit = cm
Trait ontology → Crop Ontology (CO)
Unit ontology → Units Ontology (UO)
Correct example:
| Trait | Term Source REF | Term Accession Number |
|---|---|---|
| Plant height | CO | CO_321:0000012 |
| Unit | Term Source REF | Term Accession Number |
|---|---|---|
| centimeter | UO | UO:0000015 |
3. Duplicated columns
Section titled “3. Duplicated columns”Wrong:
| Plant Height | Plant Height |
|---|---|
| 120 | |
| 130 |
One column empty-> delete it
4. Missing Link Between Files
Section titled “4. Missing Link Between Files”Common problem:
- Study not referenced in Investigation
- Assay not linked to Study
- Misspelled file names
5. Sample Name Missing
Section titled “5. Sample Name Missing”Sample Name is mandatory because:
Study → defines samples
Assay → links measurements to Sample Name
Without it → ISA structure breaks.
4. Domain-Specific Issues
Section titled “4. Domain-Specific Issues”1. Site Location Must Be Geo Coordinates
Section titled “1. Site Location Must Be Geo Coordinates”Not just:
VersaillesGaterslebenMust include Latitude and Longitude
2.Traits Must Have Full Names linked to Ontologies
Section titled “2.Traits Must Have Full Names linked to Ontologies”Not acceptable:
PHTKWDTHMust be expanded:
Plant heightThousand kernel weightDays to heading3.Basic Passport Data Required
Section titled “3.Basic Passport Data Required”Missing passport data causes:
- Broken linkage to EURISCO
- Incomplete material description
- Reduced reusability
4.Data Quality Checks Before Submission
Section titled “4.Data Quality Checks Before Submission”Perform these checks:
- Any value biologically impossible? (e.g., TKW = 0)
- Any experiment listed but no observed data?
- Are missing values empty or zero?
- All Sample Names unique?
- All ontologies filled?
- All studies referenced?
5.Final Pre-Submission Checklist
Section titled “5.Final Pre-Submission Checklist”Before uploading:
☐ ISA structure complete ☐ Investigation file fully filled ☐ All ontologies listed ☐ Study names consistent ☐ Coordinates in decimal format ☐ Sample Name present ☐ Replicates uniquely identified ☐ Passport data included ☐ ORCID and emails included ☐ No duplicated columns ☐ No empty referenced studies