Commands
Table of Contents
Here are the scripts called during data import by the make initialize
and make data
commands. For more information, see
The Data Import Process and Loading variants into database
documentation sections.
src/data/add_breed.py
Add or update a breed into SMARTER database
src/data/add_breed.py [OPTIONS]
Options
- --species_class <species_class>
Required The generic species of this breed (Sheep or Goat)
- Options
Sheep | Goat
- --name <name>
Required The breed name
- --code <code>
Required The breed code
- --alias <alias>
The FID used as a breed code in genotype file
- --dataset <dataset>
Required The raw dataset file name (zip archive)
src/data/import_affymetrix.py
Load SNP data from Affymetrix manifest file into SMARTER-database
src/data/import_affymetrix.py [OPTIONS]
Options
- --species_class <species_class>
Required
- --manifest <manifest>
Required
- --chip_name <chip_name>
Required
- --version <version>
Required
src/data/import_breeds.py
Import breeds from metadata file into SMARTER-database
src/data/import_breeds.py [OPTIONS]
Options
- --species_class <species_class>
Required The generic species of this breed (Sheep or Goat)
- Options
Sheep | Goat
- --src_dataset <src_dataset>
Required The raw dataset file name (zip archive) in which search datafile
- --dst_dataset <dst_dataset>
The raw dataset file name (zip archive) in which define breeds (def. the ‘src_dataset’)
- --datafile <datafile>
Required The metadata file in which search for information
- --code_column <code_column>
The name of the breed code column in metadata table
- --breed_column <breed_column>
The name of the breed column in metadata table
- --fid_column <fid_column>
The name of the FID column used in genotype file
- --country_column <country_column>
The name of the country column in metadata table
src/data/import_datasets.py
Import a dataset stored in data/raw
folder into the smarter
database and unpack file contents into data/interim
subfolder
INPUT_FILEPATH: The CSV dataset description file
src/data/import_datasets.py [OPTIONS] INPUT_FILEPATH
Options
- --types <types>
Required 2 argument types (ex. genotypes background, phenotypes foreground, etc)
Arguments
- INPUT_FILEPATH
Required argument
src/data/import_dbsnp.py
src/data/import_dbsnp.py [OPTIONS]
Options
- --species_class <species_class>
Required The generic species of dbSNP data (Sheep or Goat)
- --input_dir <input_dir>
Required The directory with dbSNP input (XML) files
- --sender <sender>
Required The SNP sender (ex. AGR_BS, IGGC)
- --version <version>
Required The assembly version
- --imported_from <imported_from>
The source of this data
src/data/import_from_affymetrix.py
Read genotype data from affymetrix files and convert it to the desidered assembly version using Illumina TOP coding
src/data/import_from_affymetrix.py [OPTIONS]
Options
- --prefix <prefix>
File prefix for map and ped files (like plink does)
- --report <report>
Affymetrix report path
- --dataset <dataset>
Required The raw dataset file name (zip archive)
- --coding <coding>
Affymetrix coding format
- Default
affymetrix
- Options
ab | affymetrix
- --breed_code <breed_code>
A breed code to be assigned on all samples while creating samples
- --chip_name <chip_name>
Required The SMARTER SupportedChip name
- --assembly <assembly>
Required Destination assembly of the converted genotypes
- --create_samples
Create a new SampleSheep or SampleGoat object if doesn’t exist
- --sample_field <sample_field>
Search samples using this attribute
- --search_field <search_field>
search variants using this field
- Default
probeset_id
- --src_version <src_version>
Required Source assembly version
- --src_imported_from <src_imported_from>
Required Source assembly imported_from
- --max_samples <max_samples>
Limit import to first samples (only valid for affymetrix report)
- --skip_coordinate_check
Skip coordinate check (only valid for affymetrix report)
src/data/import_from_illumina.py
Read genotype data from an Illumina report file and convert it to the desidered assembly version using Illumina TOP coding
src/data/import_from_illumina.py [OPTIONS]
Options
- --dataset <dataset>
Required The raw dataset file name (zip archive)
- --snpfile <snpfile>
Required The illumina SNPlist file
- --report <report>
Required The illumina report file
- --coding <coding>
Illumina coding format
- Default
ab
- Options
ab
- --breed_code <breed_code>
Assign this FID to every sample in illumina report
- --chip_name <chip_name>
Required The SMARTER SupportedChip name
- --assembly <assembly>
Required Destination assembly of the converted genotypes
- --create_samples
Create a new SampleSheep or SampleGoat object if doesn’t exist
src/data/import_from_plink.py
Read genotype data from a PLINK file (text or binary) and convert it to the desidered assembly version using Illumina TOP coding
src/data/import_from_plink.py [OPTIONS]
Options
- --file <file_>
PLINK text file prefix
- --bfile <bfile>
PLINK binary file prefix
- --dataset <dataset>
Required The raw dataset file name (zip archive)
- --coding <coding>
Genotype coding format
- Default
top
- Options
top | forward | ab | affymetrix | illumina
- --chip_name <chip_name>
Required The SMARTER SupportedChip name
- --assembly <assembly>
Required Destination assembly of the converted genotypes
- --create_samples
Create a new SampleSheep or SampleGoat object if doesn’t exist
- --sample_field <sample_field>
Search samples using this attribute
- --search_field <search_field>
search variants using this field
- Default
name
- --search_by_positions
search variants using their positions
- --src_version <src_version>
Source assembly version
- --src_imported_from <src_imported_from>
Source assembly imported_from
- --ignore_coding_errors
set SNP as missing when there are coding errors (no more CodingException)
src/data/import_iggc.py
Read data from Goat genome project and add a new location type for variants
src/data/import_iggc.py [OPTIONS]
Options
- --datafile <datafile>
Required
- --version <version>
Required
- --force_update
Force location update
- --date <date>
A date string
- --entry_column <entry_column>
Entry name column in datafile (the SNP name)
- Default
locus_name
- --chrom_column <chrom_column>
Required Chromosome column in datafile
- --pos_column <pos_column>
Required Position column in datafile
- --strand_column <strand_column>
Required Strand column in datafile
- --sequence_column <sequence_column>
Sequence column in datafile
- Default
sequence
src/data/import_isgc.py
Read data from Sheep genome project and add a new location type for variants
src/data/import_isgc.py [OPTIONS]
Options
- --datafile <datafile>
Required
- --version <version>
Required
- --force_update
Force location update
- --date <date>
A date string
- --entry_column <entry_column>
Entry name column in datafile (the SNP name)
- Default
entry
- --chrom_column <chrom_column>
Chromosome column in datafile
- Default
chrom
- --pos_column <pos_column>
Position column in datafile
- Default
pos
- --alleles_column <alleles_column>
Alleles column in datafile
- Default
alleles
src/data/import_manifest.py
Load SNP data from Illumina manifest file into SMARTER-database
src/data/import_manifest.py [OPTIONS]
Options
- --species_class <species_class>
Required
- --manifest <manifest>
Required
- --chip_name <chip_name>
Required
- --version <version>
Required
- --sender <sender>
Required
src/data/import_metadata.py
Read data from metadata file and add it to SMARTER-database samples
src/data/import_metadata.py [OPTIONS]
Options
- --src_dataset <src_dataset>
Required The raw dataset file name (zip archive) in which search datafile
- --dst_dataset <dst_dataset>
The raw dataset file name (zip archive) in which add metadata(def. the ‘src_dataset’)
- --datafile <datafile>
Required
- --sheet_name <sheet_name>
pandas ‘sheet_name’ option
- --breed_column <breed_column>
The breed column
- --id_column <id_column>
The original_id column
- --alias_column <alias_column>
The alias column
- --latitude_column <latitude_column>
- --longitude_column <longitude_column>
- --sex_column <sex_column>
Sex column in src datafile
- --notes_column <notes_column>
The notes field in metadata
- --metadata_column <metadata_column>
Metadata column to track. Could be specified multiple times
- --species_column <species_column>
Species column in src datafile
- --na_values <na_values>
pandas NA values
src/data/import_multiple_phenotypes.py
Read multiple data for the same sample from phenotype file and add it to SMARTER-database samples
src/data/import_multiple_phenotypes.py [OPTIONS]
Options
- --src_dataset <src_dataset>
Required The raw dataset file name (zip archive) in which search datafile
- --dst_dataset <dst_dataset>
The raw dataset file name (zip archive) in which add metadata(def. the ‘src_dataset’)
- --datafile <datafile>
Required
- --sheet_name <sheet_name>
pandas ‘sheet_name’ option
- --breed_column <breed_column>
The breed column
- --id_column <id_column>
The original_id column
- --alias_column <alias_column>
An alias for original_id
- --column <columns>
Required Column to track. Could be specified multiple times
- --na_values <na_values>
pandas NA values
src/data/import_phenotypes.py
Read data from phenotype file and add it to SMARTER-database samples
src/data/import_phenotypes.py [OPTIONS]
Options
- --src_dataset <src_dataset>
Required The raw dataset file name (zip archive) in which search datafile
- --dst_dataset <dst_dataset>
The raw dataset file name (zip archive) in which add metadata(def. the ‘src_dataset’)
- --datafile <datafile>
Required
- --sheet_name <sheet_name>
pandas ‘sheet_name’ option
- --breed_column <breed_column>
The breed column
- --id_column <id_column>
The original_id column
- --alias_column <alias_column>
An alias for original_id
- --purpose_column <purpose_column>
- --chest_girth_column <chest_girth_column>
- --height_column <height_column>
- --length_column <length_column>
- --additional_column <additional_column>
Additional column to track. Could be specified multiple times
- --na_values <na_values>
pandas NA values
src/data/import_samples.py
Generate samples from a metadata file
src/data/import_samples.py [OPTIONS]
Options
- --src_dataset <src_dataset>
Required The raw dataset file name (zip archive) in which search datafile
- --dst_dataset <dst_dataset>
The raw dataset file name (zip archive) in which define samples(def. the ‘src_dataset’)
- --datafile <datafile>
Required The metadata file in which search for information
- --code_column <code_column>
Code column in src datafile (ie FID)
- --code_all <code_all>
Code applied to all items in datafile
- --country_column <country_column>
Country column in src datafile
- --country_all <country_all>
Country applied to all items in datafile
- --species_column <species_column>
Species column in src datafile
- --species_all <species_all>
Species applied to all items in datafile
- --id_column <id_column>
Required The ‘original_id’ column to place in smarter database
- --sex_column <sex_column>
Sex column in src datafile
- --chip_name <chip_name>
Required The SMARTER SupportedChip name
- --alias_column <alias_column>
An alias for original_id
- --skip_missing_alias
Don’t import samples with no alias
src/data/import_snpchimp.py
Import data from SNPchiMp dump tables
src/data/import_snpchimp.py [OPTIONS]
Options
- --species_class <species_class>
Required
- --snpchimp <snpchimp>
Required
- --version <version>
Required
src/data/import_snpchips.py
Upload chips into src.features.smarterdb.SupportedChip
objects
src/data/import_snpchips.py [OPTIONS]
Options
- --chip_file <chip_file>
Required The chip description JSON file
src/data/merge_datasets.py
Search for processed genotype files for a certain species in
data/processed
folder and then call PLINK to join all genotypes
in the same dataset
src/data/merge_datasets.py [OPTIONS]
Options
- --species_class <species_class>
Required Search processed genotypes belonging to this species (‘Sheep’or ‘Goat’)
- --assembly <assembly>
Required Search processed genotypes belonging to this assembly
src/data/SNPconvert.py
Convert a PLINK/Illumina report file in a SMARTER-like ouput file, without inserting data in SMARTER-database. Useful to convert data relying on SMARTER-database for private datasets (data which cannot be included in SMARTER-database)
src/data/SNPconvert.py [OPTIONS]
Options
- --file <file_>
PLINK text file prefix
- --bfile <bfile>
PLINK binary file prefix
- --report <report>
The illumina report file
- --snpfile <snpfile>
The illumina SNPlist file
- --coding <coding>
Illumina coding format
- Default
top
- Options
top | forward | ab
- --assembly <assembly>
Required Destination assembly of the converted genotypes
- --species <species>
Required The SMARTER assembly species (Goat or Sheep)
- --results_dir <results_dir>
Required Where results will be saved
- --chip_name <chip_name>
The SMARTER SupportedChip name
- --search_field <search_field>
search variants using this field
- Default
name
- --search_by_positions
search variants using their positions
- --src_version <src_version>
Source assembly version
- --src_imported_from <src_imported_from>
Source assembly imported_from
- --ignore_coding_errors
set SNP as missing when there are coding errors (no more CodingException)
src/data/update_db_status.py
Update SMARTER database statuses
src/data/update_db_status.py [OPTIONS]