Skip to content

Dos and Donts

Purpose of This Guide

Many dataset rejections are not caused by bad science, but by missing ISA structure , Incomplete metadata, Broken ontology references, Formatting inconsistencies and Incorrect linkage between the ISA (Investigation-Study -Assay files)

Follow this checklist in order.

Your dataset must follow the ISA model:

Investigation
├── Study
├── Assay
├── workflows (if applicable)
└── runs (if applicable)

Minimum required: investigation.xlsx , isa.study.xlsx and isa.assay.xlsx. Common reason for rejection is uploading only the excel files without ISA structure.

Study names must follow a defined convention. For example:

genebankpartner_yearstart-yearend_sitename

Correct:

NPPC_1971-1974_Piestany
CREA_2006_SAL

Incorrect:

trial1
experiment_final
data_new_version

If the site name is missing, use coordinates temporarily:

IPGR_1985_lon-24.93388-lat-42.12002

Identifiers are the backbone of the dataset. Every sample must have sample name(mandatory in study file) , Accession ID and replicate IDs if replicated.

Sample NameAccessionReplicate
plant_1ACC12341
plant_2ACC12342

Most common rejection problems happen here. Ensure that Study referenced, Assay referenced, Ontologies listed, Contacts are complete, Description filled and All protocols described

Must Include

  • Title
  • Description
  • Submission date
  • Project name
  • First Name

  • Last Name

  • Email

  • ORCID

  • Affiliation (complete, not abbreviated)

    First NameLast NameORCIDEmailAffiliation
    JagadeeshwarEtukala0000-0002-XXXXjay@ipk.deLeibniz Institute of Plant Genetics and Crop Plant Research, Germany

Wrong:

Missing ORCID
Only "IPK" as affiliation

Elena´s comment: Not all ontologies used in study and assays are listed in investigation.

You must list ALL ontologies used.

Term Source NameTerm Source FileTerm Source Version
COhttps://cropontology.org2024
UOhttp://purl.obolibrary.org/obo/uo.owl2023
POhttp://purl.obolibrary.org/obo/po.owl2023

Minimum Required Columns:

  • Source Name
  • Sample Name
  • Organism
  • Characteristics
  • Geographic co-ordinate (latitude)
  • Geographic co-ordinate (longitude)

Correct format:

Latitude: 52.210000
Longitude: 20.640000

Incorrect format

52°12’N
20.64E
Germany (approx.)

Must include Accession number, Material source information and country or origin and source coordinates. Without passport data the data set is incomplete and not reusable.

Contains:

  1. Sample Name (must match Study file exactly)
  2. Trait
  3. Value
  4. Unit
  5. Protocol reference

Elena´s comment: Usually Term Accession number and term source REF are left empty.

TraitTerm Source REFTerm Accession Number
Days to headingCOCO_321:0000025

Common mistake:

Trait = Plant height
Unit = cm

Trait ontology → Crop Ontology (CO)
Unit ontology → Units Ontology (UO)

Correct example:

TraitTerm Source REFTerm Accession Number
Plant heightCOCO_321:0000012
UnitTerm Source REFTerm Accession Number
centimeterUOUO:0000015

Wrong:

Plant HeightPlant Height
120
130

One column empty-> delete it

Common problem:

  1. Study not referenced in Investigation
  2. Assay not linked to Study
  3. Misspelled file names

Sample Name is mandatory because:

Study → defines samples
Assay → links measurements to Sample Name

Without it → ISA structure breaks.

Not just:

Versailles
Gatersleben

Must include Latitude and Longitude

Not acceptable:

PH
TKW
DTH

Must be expanded:

Plant height
Thousand kernel weight
Days to heading

Missing passport data causes:

  1. Broken linkage to EURISCO
  2. Incomplete material description
  3. Reduced reusability

Perform these checks:

  1. Any value biologically impossible? (e.g., TKW = 0)
  2. Any experiment listed but no observed data?
  3. Are missing values empty or zero?
  4. All Sample Names unique?
  5. All ontologies filled?
  6. All studies referenced?

Before uploading:

☐ ISA structure complete
☐ Investigation file fully filled
☐ All ontologies listed
☐ Study names consistent
☐ Coordinates in decimal format
☐ Sample Name present
☐ Replicates uniquely identified
☐ Passport data included
☐ ORCID and emails included
☐ No duplicated columns
☐ No empty referenced studies