Commands 

Table of Contents

Here are the scripts called during data import by the make initialize and make data commands. For more information, see The Data Import Process and Loading variants into database documentation sections.

src/data/add_breed.py 

Add or update a breed into SMARTER database

src/data/add_breed.py [OPTIONS]

Options

--species_class <species_class>

Required The generic species of this breed (Sheep or Goat)

Options: Sheep | Goat

--name <name>: Required The breed name

--code <code>: Required The breed code

--alias <alias>: The FID used as a breed code in genotype file

--dataset <dataset>: Required The raw dataset file name (zip archive)

src/data/import_affymetrix.py 

Load SNP data from Affymetrix manifest file into SMARTER-database

src/data/import_affymetrix.py [OPTIONS]

Options

--species_class <species_class>: Required

--manifest <manifest>: Required

--chip_name <chip_name>: Required

--version <version>: Required

src/data/import_breeds.py 

Import breeds from metadata file into SMARTER-database

src/data/import_breeds.py [OPTIONS]

Options

--species_class <species_class>

Required The generic species of this breed (Sheep or Goat)

Options: Sheep | Goat

--src_dataset <src_dataset>: Required The raw dataset file name (zip archive) in which search datafile

--dst_dataset <dst_dataset>: The raw dataset file name (zip archive) in which define breeds (def. the ‘src_dataset’)

--datafile <datafile>: Required The metadata file in which search for information

--code_column <code_column>: The name of the breed code column in metadata table

--breed_column <breed_column>: The name of the breed column in metadata table

--fid_column <fid_column>: The name of the FID column used in genotype file

--country_column <country_column>: The name of the country column in metadata table

src/data/import_datasets.py 

Import a dataset stored in data/raw folder into the smarter database and unpack file contents into data/interim subfolder

INPUT_FILEPATH: The CSV dataset description file

src/data/import_datasets.py [OPTIONS] INPUT_FILEPATH

Options

--types <types>: Required 2 argument types (ex. genotypes background, phenotypes foreground, etc)

Arguments

INPUT_FILEPATH: Required argument

src/data/import_dbsnp.py 

src/data/import_dbsnp.py [OPTIONS]

Options

--species_class <species_class>: Required The generic species of dbSNP data (Sheep or Goat)

--input_dir <input_dir>: Required The directory with dbSNP input (XML) files

--pattern <pattern>

The directory with dbSNP input (XML) files

Default: *.gz

--sender <sender>: Required The SNP sender (ex. AGR_BS, IGGC)

--version <version>: Required The assembly version

--imported_from <imported_from>: The source of this data

src/data/import_from_affymetrix.py 

Read genotype data from affymetrix files and convert it to the desidered assembly version using Illumina TOP coding

src/data/import_from_affymetrix.py [OPTIONS]

Options

--prefix <prefix>: File prefix for map and ped files (like plink does)

--report <report>: Affymetrix report path

--dataset <dataset>: Required The raw dataset file name (zip archive)

--coding <coding>

Affymetrix coding format

Default: affymetrix
Options: ab | affymetrix

--breed_code <breed_code>: A breed code to be assigned on all samples while creating samples

--chip_name <chip_name>: Required The SMARTER SupportedChip name

--assembly <assembly>: Required Destination assembly of the converted genotypes

--create_samples: Create a new SampleSheep or SampleGoat object if doesn’t exist

--sample_field <sample_field>: Search samples using this attribute

--search_field <search_field>

search variants using this field

Default: probeset_id

--src_version <src_version>: Required Source assembly version

--src_imported_from <src_imported_from>: Required Source assembly imported_from

--max_samples <max_samples>: Limit import to first samples (only valid for affymetrix report)

--skip_coordinate_check: Skip coordinate check (only valid for affymetrix report)

src/data/import_from_illumina.py 

Read genotype data from an Illumina report file and convert it to the desidered assembly version using Illumina TOP coding

src/data/import_from_illumina.py [OPTIONS]

Options

--dataset <dataset>: Required The raw dataset file name (zip archive)

--snpfile <snpfile>: Required The illumina SNPlist file

--report <report>: Required The illumina report file

--coding <coding>

Illumina coding format

Default: ab
Options: ab

--breed_code <breed_code>: Assign this FID to every sample in illumina report

--chip_name <chip_name>: Required The SMARTER SupportedChip name

--assembly <assembly>: Required Destination assembly of the converted genotypes

--create_samples: Create a new SampleSheep or SampleGoat object if doesn’t exist

src/data/import_from_plink.py 

Read genotype data from a PLINK file (text or binary) and convert it to the desidered assembly version using Illumina TOP coding

src/data/import_from_plink.py [OPTIONS]

Options

--file <file_>: PLINK text file prefix

--bfile <bfile>: PLINK binary file prefix

--dataset <dataset>: Required The raw dataset file name (zip archive)

--coding <coding>

Genotype coding format

Default: top
Options: top | forward | ab | affymetrix | illumina

--chip_name <chip_name>: Required The SMARTER SupportedChip name

--assembly <assembly>: Required Destination assembly of the converted genotypes

--create_samples: Create a new SampleSheep or SampleGoat object if doesn’t exist

--sample_field <sample_field>: Search samples using this attribute

--search_field <search_field>

search variants using this field

Default: name

--search_by_positions: search variants using their positions

--src_version <src_version>: Source assembly version

--src_imported_from <src_imported_from>: Source assembly imported_from

--ignore_coding_errors: set SNP as missing when there are coding errors (no more CodingException)

src/data/import_iggc.py 

Read data from Goat genome project and add a new location type for variants

src/data/import_iggc.py [OPTIONS]

Options

--datafile <datafile>: Required

--version <version>: Required

--force_update: Force location update

--date <date>: A date string

--entry_column <entry_column>

Entry name column in datafile (the SNP name)

Default: locus_name

--chrom_column <chrom_column>: Required Chromosome column in datafile

--pos_column <pos_column>: Required Position column in datafile

--strand_column <strand_column>: Required Strand column in datafile

--sequence_column <sequence_column>

Sequence column in datafile

Default: sequence

--rs_column <rs_column>

rsID column in datafile

Default: rs_

src/data/import_isgc.py 

Read data from Sheep genome project and add a new location type for variants

src/data/import_isgc.py [OPTIONS]

Options

--datafile <datafile>: Required

--version <version>: Required

--force_update: Force location update

--date <date>: A date string

--entry_column <entry_column>

Entry name column in datafile (the SNP name)

Default: entry

--chrom_column <chrom_column>

Chromosome column in datafile

Default: chrom

--pos_column <pos_column>

Position column in datafile

Default: pos

--alleles_column <alleles_column>

Alleles column in datafile

Default: alleles

src/data/import_manifest.py 

Load SNP data from Illumina manifest file into SMARTER-database

src/data/import_manifest.py [OPTIONS]

Options

--species_class <species_class>: Required

--manifest <manifest>: Required

--chip_name <chip_name>: Required

--version <version>: Required

--sender <sender>: Required

src/data/import_metadata.py 

Read data from metadata file and add it to SMARTER-database samples

src/data/import_metadata.py [OPTIONS]

Options

--src_dataset <src_dataset>: Required The raw dataset file name (zip archive) in which search datafile

--dst_dataset <dst_dataset>: The raw dataset file name (zip archive) in which add metadata(def. the ‘src_dataset’)

--datafile <datafile>: Required

--sheet_name <sheet_name>: pandas ‘sheet_name’ option

--breed_column <breed_column>: The breed column

--id_column <id_column>: The original_id column

--alias_column <alias_column>: The alias column

--latitude_column <latitude_column>

--longitude_column <longitude_column>

--sex_column <sex_column>: Sex column in src datafile

--notes_column <notes_column>: The notes field in metadata

--metadata_column <metadata_column>: Metadata column to track. Could be specified multiple times

--species_column <species_column>: Species column in src datafile

--na_values <na_values>: pandas NA values

src/data/import_multiple_phenotypes.py 

Read multiple data for the same sample from phenotype file and add it to SMARTER-database samples

src/data/import_multiple_phenotypes.py [OPTIONS]

Options

--src_dataset <src_dataset>: Required The raw dataset file name (zip archive) in which search datafile

--dst_dataset <dst_dataset>: The raw dataset file name (zip archive) in which add metadata(def. the ‘src_dataset’)

--datafile <datafile>: Required

--sheet_name <sheet_name>: pandas ‘sheet_name’ option

--breed_column <breed_column>: The breed column

--id_column <id_column>: The original_id column

--alias_column <alias_column>: An alias for original_id

--column <columns>: Required Column to track. Could be specified multiple times

--na_values <na_values>: pandas NA values

src/data/import_phenotypes.py 

Read data from phenotype file and add it to SMARTER-database samples

src/data/import_phenotypes.py [OPTIONS]

Options

--src_dataset <src_dataset>: Required The raw dataset file name (zip archive) in which search datafile

--dst_dataset <dst_dataset>: The raw dataset file name (zip archive) in which add metadata(def. the ‘src_dataset’)

--datafile <datafile>: Required

--sheet_name <sheet_name>: pandas ‘sheet_name’ option

--breed_column <breed_column>: The breed column

--id_column <id_column>: The original_id column

--alias_column <alias_column>: An alias for original_id

--purpose_column <purpose_column>

--chest_girth_column <chest_girth_column>

--height_column <height_column>

--length_column <length_column>

--additional_column <additional_column>: Additional column to track. Could be specified multiple times

--na_values <na_values>: pandas NA values

src/data/import_samples.py 

Generate samples from a metadata file

src/data/import_samples.py [OPTIONS]

Options

--src_dataset <src_dataset>: Required The raw dataset file name (zip archive) in which search datafile

--dst_dataset <dst_dataset>: The raw dataset file name (zip archive) in which define samples(def. the ‘src_dataset’)

--datafile <datafile>: Required The metadata file in which search for information

--code_column <code_column>: Code column in src datafile (ie FID)

--code_all <code_all>: Code applied to all items in datafile

--country_column <country_column>: Country column in src datafile

--country_all <country_all>: Country applied to all items in datafile

--species_column <species_column>: Species column in src datafile

--species_all <species_all>: Species applied to all items in datafile

--id_column <id_column>: Required The ‘original_id’ column to place in smarter database

--sex_column <sex_column>: Sex column in src datafile

--chip_name <chip_name>: Required The SMARTER SupportedChip name

--alias_column <alias_column>: An alias for original_id

--skip_missing_alias: Don’t import samples with no alias

src/data/import_snpchimp.py 

Import data from SNPchiMp dump tables

src/data/import_snpchimp.py [OPTIONS]

Options

--species_class <species_class>: Required

--snpchimp <snpchimp>: Required

--version <version>: Required

src/data/import_snpchips.py 

Upload chips into src.features.smarterdb.SupportedChip objects

src/data/import_snpchips.py [OPTIONS]

Options

--chip_file <chip_file>: Required The chip description JSON file

src/data/merge_datasets.py 

Search for processed genotype files for a certain species in data/processed folder and then call PLINK to join all genotypes in the same dataset

src/data/merge_datasets.py [OPTIONS]

Options

--species_class <species_class>: Required Search processed genotypes belonging to this species (‘Sheep’or ‘Goat’)

--assembly <assembly>: Required Search processed genotypes belonging to this assembly

src/data/SNPconvert.py 

Convert a PLINK/Illumina report file in a SMARTER-like ouput file, without inserting data in SMARTER-database. Useful to convert data relying on SMARTER-database for private datasets (data which cannot be included in SMARTER-database)

src/data/SNPconvert.py [OPTIONS]

Options

--file <file_>: PLINK text file prefix

--bfile <bfile>: PLINK binary file prefix

--report <report>: The illumina report file

--snpfile <snpfile>: The illumina SNPlist file

--coding <coding>

Illumina coding format

Default: top
Options: top | forward | ab

--assembly <assembly>: Required Destination assembly of the converted genotypes

--species <species>: Required The SMARTER assembly species (Goat or Sheep)

--results_dir <results_dir>: Required Where results will be saved

--chip_name <chip_name>: The SMARTER SupportedChip name

--search_field <search_field>

search variants using this field

Default: name

--search_by_positions: search variants using their positions

--src_version <src_version>: Source assembly version

--src_imported_from <src_imported_from>: Source assembly imported_from

--ignore_coding_errors: set SNP as missing when there are coding errors (no more CodingException)

src/data/update_db_status.py 

Update SMARTER database statuses

src/data/update_db_status.py [OPTIONS]