src.features.smarterdb

Created on Tue Feb 23 16:21:35 2021

@author: Paolo Cozzi <paolo.cozzi@ibba.cnr.it>

Classes:

Breed(*args, **values)

BreedAlias(*args, **kwargs)

Required to describe the breed and code used in a certain dataset in order to resolve the final breed to be used in SMARTER-database

Consequence(*args, **kwargs)

A class to manage SNP consequences.

Counter(*args, **values)

A class to deal with counter collection (created when initializing smarter database) and used to define SMARTER IDs

Country([name])

A helper class to deal with countries object.

Dataset(*args, **values)

Describe a dataset instace with fields owned by data types

Location(*args, **kwargs)

A class to deal with a SNP location (ie position in an assembly for a certain chip or data source)

Phenotype(*args, **kwargs)

A class to deal with phenotypes.

Probeset(*args, **kwargs)

A class to deal with different affymetrix probesets

SAMPLETYPE(value)

A simple Enum object to define sample type (background or foreground)

SEX(value)

An enum object to manage Sample sex in the same way as plink does

SampleGoat(*args, **values)

A class specific for Goat samples

SampleSheep(*args, **values)

A class specific for Sheep samples

SampleSpecies(*args, **values)

A generic class used to manage Goat or Sheep samples

SmarterInfo(*args, **values)

A class to track database status informations

SupportedChip(*args, **values)

A class to deal with SMARTER-database managed chips

VariantGoat(*args, **values)

A class to deal with Goat variations (SNP)

VariantSheep(*args, **values)

A class to deal with Sheep variations (SNP)

VariantSpecies(*args, **values)

Generic class to deal with Variant (SNP) objects

Exceptions:

SmarterDBException

Functions:

complement(genotype)

Return reverse complement for a base call

getNextSequenceValue(sequence_name, mongodb)

Read from Counter collection and determine the next sequence number to be used for the SMARTER ID

getSmarterId(species_class, country, breed)

Generate a new SMARTER ID object using the internal counter collections

get_or_create_breed(species_class, name, code)

Get a Breed instance or create a new one (or update a breed adding a new BreedAlias)

get_or_create_sample(SampleSpecies, ...[, ...])

Get or create a sample providing attributes (search for original_id in provided dataset

get_sample_type(dataset)

test if foreground or background dataset

global_connection([database_name])

Establish a connection to the SMARTER database.

class src.features.smarterdb.Breed(*args, **values)[source]

Bases: Document

Miscellaneous:

DoesNotExist

MultipleObjectsReturned

Attributes:

aliases

A list of BreedAlias objects.

code

The breed code

id

A field wrapper around MongoDB's ObjectIds.

n_individuals

How many samples are the same breed

name

The breed name

objects([q_obj])

species

The breed species.

exception DoesNotExist

Bases: DoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

aliases

A list of BreedAlias objects. Required to determine the SMARTER-database breed from the genotype file (which can use a different breed name or code)

code

The breed code

id

A field wrapper around MongoDB’s ObjectIds.

n_individuals

How many samples are the same breed

name

The breed name

objects(q_obj=None, **query) = []
species

The breed species. Should be one of Goat or Sheep

class src.features.smarterdb.BreedAlias(*args, **kwargs)[source]

Bases: EmbeddedDocument

Required to describe the breed and code used in a certain dataset in order to resolve the final breed to be used in SMARTER-database

Attributes:

country

The country of the breed in the dataset.

dataset

The dataset ObjectID in which this BreedAlias is used

fid

The breed Family ID used in genotype file

country

The country of the breed in the dataset. Used in multi country datasets

dataset

The dataset ObjectID in which this BreedAlias is used

fid

The breed Family ID used in genotype file

class src.features.smarterdb.Consequence(*args, **kwargs)[source]

Bases: EmbeddedDocument

A class to manage SNP consequences. Not yet implemented

class src.features.smarterdb.Counter(*args, **values)[source]

Bases: Document

A class to deal with counter collection (created when initializing smarter database) and used to define SMARTER IDs

Miscellaneous:

DoesNotExist

MultipleObjectsReturned

Attributes:

id

A unicode string field.

objects([q_obj])

sequence_value

32-bit integer field.

exception DoesNotExist

Bases: DoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

id

A unicode string field.

objects(q_obj=None, **query) = []
sequence_value

32-bit integer field.

class src.features.smarterdb.Country(name: Optional[str] = None, *args, **kwargs)[source]

Bases: Document

A helper class to deal with countries object. Each record is created after data import, when database status is updated

Miscellaneous:

DoesNotExist

MultipleObjectsReturned

Methods:

__init__([name])

Initialise a document or an embedded document.

Attributes:

alpha_2

Country 2 letter code (used to derive SMARTER IDs)

alpha_3

Country 3 letter code

id

A field wrapper around MongoDB's ObjectIds.

name

The Country name

numeric

The country numeric code

objects([q_obj])

official_name

Country ufficial name

species

The sample species find within this country

exception DoesNotExist

Bases: DoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

__init__(name: Optional[str] = None, *args, **kwargs)[source]

Initialise a document or an embedded document.

Parameters
  • values – A dictionary of keys and values for the document. It may contain additional reserved keywords, e.g. “__auto_convert”.

  • __auto_convert – If True, supplied values will be converted to Python-type values via each field’s to_python method.

  • _created – Indicates whether this is a brand new document or whether it’s already been persisted before. Defaults to true.

alpha_2

Country 2 letter code (used to derive SMARTER IDs)

alpha_3

Country 3 letter code

id

A field wrapper around MongoDB’s ObjectIds.

name

The Country name

numeric

The country numeric code

objects(q_obj=None, **query) = []
official_name

Country ufficial name

species

The sample species find within this country

class src.features.smarterdb.Dataset(*args, **values)[source]

Bases: Document

Describe a dataset instace with fields owned by data types

Miscellaneous:

DoesNotExist

MultipleObjectsReturned

Attributes:

breed

The breed of the dataset.

chip_name

The SupportedChip.name attribute of the technology used

contents

Dataset contents as a list

country

The country where the data come from.

doi

The publication DOI of this dataset

file

The source dataset file

gene_array

The technology used to generate data specified by the partner

id

A field wrapper around MongoDB's ObjectIds.

n_of_individuals

Number of individual in the dataset

n_of_records

Number of the record in the phenotype file

objects([q_obj])

partner

The partner which owns the dataset

result_dir

returns the locations of dataset processed directory.

size_

The file size

species

The species of the data.

trait

Trait described in phenotype file

type_

Dataset type.

uploader

The partner which upload this dataset

working_dir

returns the locations of dataset working directory.

exception DoesNotExist

Bases: DoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

breed

The breed of the dataset. Could have many values

chip_name

The SupportedChip.name attribute of the technology used

contents

Dataset contents as a list

country

The country where the data come from. Could have many values

doi

The publication DOI of this dataset

file

The source dataset file

gene_array

The technology used to generate data specified by the partner

id

A field wrapper around MongoDB’s ObjectIds.

n_of_individuals

Number of individual in the dataset

n_of_records

Number of the record in the phenotype file

objects(q_obj=None, **query) = []
partner

The partner which owns the dataset

property result_dir: PosixPath

returns the locations of dataset processed directory. Could exists or not

Returns

a subdirectory in /data/processed/

Return type

pathlib.PosixPath

size_

The file size

species

The species of the data. Could be ‘Sheep’ or ‘Goat’

trait

Trait described in phenotype file

type_

Dataset type. Need to be one from ['genotypes', 'phenotypes] and one from ['background', 'foreground']

uploader

The partner which upload this dataset

property working_dir: PosixPath

returns the locations of dataset working directory. Could exists or not

Returns

a subdirectory in /data/interim/

Return type

pathlib.PosixPath

class src.features.smarterdb.Location(*args, **kwargs)[source]

Bases: EmbeddedDocument

A class to deal with a SNP location (ie position in an assembly for a certain chip or data source)

Methods:

__init__(*args, **kwargs)

Initialise a document or an embedded document.

ab2top(genotype[, missing])

Convert an illumina ab SNP in a illumina top snp

affy2top(genotype[, missing])

Convert an affymetrix SNP in a illumina top snp

forward2top(genotype[, missing])

Convert an illumina forward SNP in a illumina top snp

illumina2top(genotype[, missing])

Convert an illumina SNP in a illumina top snp

is_ab(genotype[, missing])

Return True if genotype is compatible with illumina AB coding

is_affymetrix(genotype[, missing])

Return True if genotype is compatible with affymetrix coding

is_forward(genotype[, missing])

Return True if genotype is compatible with illumina FORWARD coding

is_illumina(genotype[, missing])

Return True if genotype is compatible with illumina coding (as it's recorded in manifest)

is_top(genotype[, missing])

Return True if genotype is compatible with illumina TOP coding

Attributes:

affymetrix_ab

The SNP code read as it is from affymetrix data

alleles

The dbSNP alleles of such SNP

chrom

The chromosome where this SNP is located

consequences

A list of SNP consequences (not yet implemented)

date

Track manifactured date or when this data was last updated

illumina

The SNP code read as it is from illumina data

illumina_forward

The SNP code in illumina forward coding

illumina_strand

The probe orientation in alignment

illumina_top

Return genotype in illumina top format

imported_from

The source of the SNP data

position

The SNP position

ss_id

The SNP subission ID

strand

The strand orientation in aligment

version

The assembly version where this SNP is placed

__init__(*args, **kwargs)[source]

Initialise a document or an embedded document.

Parameters
  • values – A dictionary of keys and values for the document. It may contain additional reserved keywords, e.g. “__auto_convert”.

  • __auto_convert – If True, supplied values will be converted to Python-type values via each field’s to_python method.

  • _created – Indicates whether this is a brand new document or whether it’s already been persisted before. Defaults to true.

ab2top(genotype: list, missing: list = ['0', '-']) list[source]

Convert an illumina ab SNP in a illumina top snp

Parameters
  • genotype (list) – a list of two alleles (ex [‘A’,’B’])

  • missing (list) – a list of missing allele strings (def [“0”, “-“])

Returns

The genotype in top format

Return type

list

affy2top(genotype: list, missing: list = ['0', '-']) list[source]

Convert an affymetrix SNP in a illumina top snp

Parameters
  • genotype (list) – a list of two alleles (ex [‘A’,’C’])

  • missing (list) – a list of missing allele strings (def [“0”, “-“])

Returns

The genotype in top format

Return type

list

affymetrix_ab

The SNP code read as it is from affymetrix data

alleles

The dbSNP alleles of such SNP

chrom

The chromosome where this SNP is located

consequences

A list of SNP consequences (not yet implemented)

date

Track manifactured date or when this data was last updated

forward2top(genotype: list, missing: list = ['0', '-']) list[source]

Convert an illumina forward SNP in a illumina top snp

Parameters
  • genotype (list) – a list of two alleles (ex [‘A’,’C’])

  • missing (list) – a list of missing allele strings (def [“0”, “-“])

Returns

The genotype in top format

Return type

list

illumina

The SNP code read as it is from illumina data

illumina2top(genotype: list, missing: list = ['0', '-']) list[source]

Convert an illumina SNP in a illumina top snp

Parameters
  • genotype (list) – a list of two alleles (ex [‘A’,’C’])

  • missing (list) – a list of missing allele strings (def [“0”, “-“])

Returns

The genotype in top format

Return type

list

illumina_forward

The SNP code in illumina forward coding

illumina_strand

The probe orientation in alignment

property illumina_top

Return genotype in illumina top format

imported_from

The source of the SNP data

is_ab(genotype: list, missing: list = ['0', '-']) bool[source]

Return True if genotype is compatible with illumina AB coding

Parameters
  • genotype (list) – a list of two alleles (ex [‘A’,’B’])

  • missing (list) – a list of missing allele strings (def [“0”, “-“])

Returns

True if in AB coding

Return type

bool

is_affymetrix(genotype: list, missing: list = ['0', '-']) bool[source]

Return True if genotype is compatible with affymetrix coding

Parameters
  • genotype (list) – a list of two alleles (ex [‘A’,’C’])

  • missing (list) – a list of missing allele strings (def [“0”, “-“])

Returns

True if in affymetrix AB coding

Return type

bool

is_forward(genotype: list, missing: list = ['0', '-']) bool[source]

Return True if genotype is compatible with illumina FORWARD coding

Parameters
  • genotype (list) – a list of two alleles (ex [‘A’,’C’])

  • missing (list) – a list of missing allele strings (def [“0”, “-“])

Returns

True if in forward coding

Return type

bool

is_illumina(genotype: list, missing: list = ['0', '-']) bool[source]

Return True if genotype is compatible with illumina coding (as it’s recorded in manifest)

Parameters
  • genotype (list) – a list of two alleles (ex [‘A’,’C’])

  • missing (list) – a list of missing allele strings (def [“0”, “-“])

Returns

True if in affymetrix AB coding

Return type

bool

is_top(genotype: list, missing: list = ['0', '-']) bool[source]

Return True if genotype is compatible with illumina TOP coding

Parameters
  • genotype (list) – a list of two alleles (ex [‘A’,’C’])

  • missing (list) – a list of missing allele strings (def [“0”, “-“])

Returns

True if in top coding

Return type

bool

position

The SNP position

ss_id

The SNP subission ID

strand

The strand orientation in aligment

version

The assembly version where this SNP is placed

class src.features.smarterdb.Phenotype(*args, **kwargs)[source]

Bases: DynamicEmbeddedDocument

A class to deal with phenotypes. This is a dynamic document and not a generic DictField since there can be attributes which could be enforced to have certain values. All other attributes could be set without any assumptions

Attributes:

chest_girth

Floating point number field.

height

Floating point number field.

length

Floating point number field.

purpose

A unicode string field.

chest_girth

Floating point number field.

height

Floating point number field.

length

Floating point number field.

purpose

A unicode string field.

class src.features.smarterdb.Probeset(*args, **kwargs)[source]

Bases: EmbeddedDocument

A class to deal with different affymetrix probesets

Attributes:

chip_name

the chip name where this affymetrix probeset comes from

probeset_id

A list probeset assigned to the same SNP

chip_name

the chip name where this affymetrix probeset comes from

probeset_id

A list probeset assigned to the same SNP

class src.features.smarterdb.SAMPLETYPE(value)[source]

Bases: Enum

A simple Enum object to define sample type (background or foreground)

Attributes:

BACKGROUND

FOREGROUND

BACKGROUND = 'background'
FOREGROUND = 'foreground'
class src.features.smarterdb.SEX(value)[source]

Bases: bytes, Enum

An enum object to manage Sample sex in the same way as plink does

Attributes:

FEMALE

MALE

UNKNOWN

Methods:

from_string(value)

Get proper type relying on input string

FEMALE = 2
MALE = 1
UNKNOWN = 0
classmethod from_string(value: str)[source]

Get proper type relying on input string

Parameters

value (str) – required sex as string

Returns

A sex instance (MALE, FEMALE, UNKNOWN)

Return type

SEX

class src.features.smarterdb.SampleGoat(*args, **values)[source]

Bases: SampleSpecies

A class specific for Goat samples

Miscellaneous:

DoesNotExist

MultipleObjectsReturned

Attributes:

father_id

The father (SIRE) of this animal.

id

A field wrapper around MongoDB's ObjectIds.

mother_id

The mother (DAM) of this animal.

objects([q_obj])

species

The species name.

species_class

The generic specie class

exception DoesNotExist

Bases: DoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

father_id

The father (SIRE) of this animal. Is a reference to another SampleGoat instance

id

A field wrapper around MongoDB’s ObjectIds.

mother_id

The mother (DAM) of this animal. Is a reference to another SampleGoat instance

objects(q_obj=None, **query) = []
species

The species name. Could be something different from Capra hircus

species_class = 'Goat'

The generic specie class

class src.features.smarterdb.SampleSheep(*args, **values)[source]

Bases: SampleSpecies

A class specific for Sheep samples

Miscellaneous:

DoesNotExist

MultipleObjectsReturned

Attributes:

father_id

The father (SIRE) of this animal.

id

A field wrapper around MongoDB's ObjectIds.

mother_id

The mother (DAM) of this animal.

objects([q_obj])

species

The species name.

species_class

The generic specie class

exception DoesNotExist

Bases: DoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

father_id

The father (SIRE) of this animal. Is a reference to another SampleSheep instance

id

A field wrapper around MongoDB’s ObjectIds.

mother_id

The mother (DAM) of this animal. Is a reference to another SampleSheep instance

objects(q_obj=None, **query) = []
species

The species name. Could be something different from Ovis aries

species_class = 'Sheep'

The generic specie class

class src.features.smarterdb.SampleSpecies(*args, **values)[source]

Bases: Document

A generic class used to manage Goat or Sheep samples

Attributes:

alias

This is a sample alias, mainly the name used in the genotype file, which can be different from the name specified in the metadata file

breed

The breed full name

breed_code

The breed code

chip_name

The chip name used to define this sample

country

Where this samples comes from

dataset

The dataset where this sample come from

locations

The sample GPS location as a Point (X, Y -> longitude, latitude).

metadata

Additional metadata (not managed via ORM)

original_id

The sample original ID in the source dataset

phenotype

A Phenotype instance

sex

A SEX instance.

smarter_id

A SMARTER unique and stable identifier

species_class

A generic species (Sheep or Goat).

type_

A SAMPLETYPE instance (ie, background or foreground

Methods:

save(*args, **kwargs)

Custom save method.

alias

This is a sample alias, mainly the name used in the genotype file, which can be different from the name specified in the metadata file

breed

The breed full name

breed_code

The breed code

chip_name

The chip name used to define this sample

country

Where this samples comes from

dataset

The dataset where this sample come from

locations

The sample GPS location as a Point (X, Y -> longitude, latitude). Mind that a location is specified in latitude and longitude coordinates. Specifying coordinates header in general is useful to avoid errors

metadata

Additional metadata (not managed via ORM)

original_id

The sample original ID in the source dataset

phenotype

A Phenotype instance

save(*args, **kwargs)[source]

Custom save method. Deal with smarter_id before save

sex

A SEX instance. Store sex like plink does

smarter_id

A SMARTER unique and stable identifier

species_class = None

A generic species (Sheep or Goat). Used to determine specific methods and to identify the proper data from the database

type_

A SAMPLETYPE instance (ie, background or foreground

exception src.features.smarterdb.SmarterDBException[source]

Bases: Exception

class src.features.smarterdb.SmarterInfo(*args, **values)[source]

Bases: Document

A class to track database status informations

Miscellaneous:

DoesNotExist

MultipleObjectsReturned

Attributes:

id

A unicode string field.

last_updated

When the SMARTER-database was updated for the last time

objects([q_obj])

plink_specie_opt

The plink parameters used to generate the final genotype dataset

version

The SMARTER-database version

working_assemblies

A dictionary in which managed assemblies are tracked

exception DoesNotExist

Bases: DoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

id

A unicode string field.

last_updated

When the SMARTER-database was updated for the last time

objects(q_obj=None, **query) = []

The plink parameters used to generate the final genotype dataset

version

The SMARTER-database version

working_assemblies

A dictionary in which managed assemblies are tracked

class src.features.smarterdb.SupportedChip(*args, **values)[source]

Bases: Document

A class to deal with SMARTER-database managed chips

Miscellaneous:

DoesNotExist

MultipleObjectsReturned

Attributes:

id

A field wrapper around MongoDB's ObjectIds.

manifacturer

Who created the chip

n_of_snps

How many SNPs are described within this chip

name

The chip identifier

objects([q_obj])

species

The species for which a chip is defined

exception DoesNotExist

Bases: DoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

id

A field wrapper around MongoDB’s ObjectIds.

manifacturer

Who created the chip

n_of_snps

How many SNPs are described within this chip

name

The chip identifier

objects(q_obj=None, **query) = []
species

The species for which a chip is defined

class src.features.smarterdb.VariantGoat(*args, **values)[source]

Bases: VariantSpecies

A class to deal with Goat variations (SNP)

Miscellaneous:

DoesNotExist

MultipleObjectsReturned

Attributes:

id

A field wrapper around MongoDB's ObjectIds.

objects([q_obj])

exception DoesNotExist

Bases: DoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

id

A field wrapper around MongoDB’s ObjectIds.

objects(q_obj=None, **query) = []
class src.features.smarterdb.VariantSheep(*args, **values)[source]

Bases: VariantSpecies

A class to deal with Sheep variations (SNP)

Miscellaneous:

DoesNotExist

MultipleObjectsReturned

Attributes:

id

A field wrapper around MongoDB's ObjectIds.

objects([q_obj])

exception DoesNotExist

Bases: DoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

id

A field wrapper around MongoDB’s ObjectIds.

objects(q_obj=None, **query) = []
class src.features.smarterdb.VariantSpecies(*args, **values)[source]

Bases: Document

Generic class to deal with Variant (SNP) objects

Attributes:

affy_snp_id

The affymetrix SNP id

chip_name

The chip names where this SNP could be found

cust_id

The affymetrix customer id (which is the illumina name)

illumina_top

Illumina TOP variant (which is the same indipendently by locations)

locations

A list of Location objects

name

The name of the SNPs.

probesets

A list of Probeset objects

rs_id

The SNP rsID

sender

Who provide this SNP probe

sequence

A dictionary where keys are chip_name, and values are their probe sequences

Methods:

get_location(version[, imported_from])

Returns location for assembly version and imported source

get_location_index(version[, imported_from])

Returns location index for assembly version and imported source

save(*args, **kwargs)

Custom save method.

affy_snp_id

The affymetrix SNP id

chip_name

The chip names where this SNP could be found

cust_id

The affymetrix customer id (which is the illumina name)

get_location(version: str, imported_from='SNPchiMp v.3')[source]

Returns location for assembly version and imported source

Parameters
  • version (str) – assembly version (ex: ‘Oar_v3.1’)

  • imported_from (str) – coordinates source (ex: ‘SNPchiMp v.3’)

Returns

the genomic coordinates

Return type

Location

get_location_index(version: str, imported_from='SNPchiMp v.3')[source]

Returns location index for assembly version and imported source

Parameters
  • version (str) – assembly version (ex: ‘Oar_v3.1’)

  • imported_from (str) – coordinates source (ex: ‘SNPchiMp v.3’)

Returns

the index of the location requested

Return type

int

illumina_top

Illumina TOP variant (which is the same indipendently by locations)

locations

A list of Location objects

name

The name of the SNPs. Could be illumina name or affyemtrix name

probesets

A list of Probeset objects

rs_id

The SNP rsID

save(*args, **kwargs)[source]

Custom save method. Deal with variant name before save

sender

Who provide this SNP probe

sequence

A dictionary where keys are chip_name, and values are their probe sequences

src.features.smarterdb.complement(genotype: str) str[source]

Return reverse complement for a base call

Parameters

genotype (str) – A base call (one from A, T, G, C).

Returns

result – The reverse complement of the base call.

Return type

str

src.features.smarterdb.getNextSequenceValue(sequence_name: str, mongodb: Database)[source]

Read from Counter collection and determine the next sequence number to be used for the SMARTER ID

src.features.smarterdb.getSmarterId(species_class: str, country: str, breed: str) str[source]

Generate a new SMARTER ID object using the internal counter collections

Parameters
  • species_class (str) – The class of the species (should be ‘Goat’ or ‘Sheep’).

  • country (str) – The country name of the sample.

  • breed (str) – The breed name of the sample.

Raises

SmarterDBException – Raised when passing a wrong species or no one.

Returns

A new smarter_id.

Return type

str

src.features.smarterdb.get_or_create_breed(species_class: str, name: str, code: str, aliases: list = []) [<class 'src.features.smarterdb.Breed'>, <class 'bool'>][source]

Get a Breed instance or create a new one (or update a breed adding a new BreedAlias)

Parameters
  • species_class (str) – The class of the species (should be ‘Goat’ or ‘Sheep’)

  • name (str) – The breed full name.

  • code (str) – The breed code (unique in Sheep and Goats collections).

  • aliases (list, optional) – A list of BreedAlias objects. The default is [].

Raises

SmarterDBException – Raised if the breed is not Unique.

Returns

  • breed (Breed) – A Breed instance.

  • modified (bool) – True is breed is created (or alias updated).

src.features.smarterdb.get_or_create_sample(SampleSpecies: Union[SampleGoat, SampleSheep], original_id: str, dataset: Dataset, type_: str, breed: Breed, country: str, species: Optional[str] = None, chip_name: Optional[str] = None, sex: Optional[SEX] = None, alias: Optional[str] = None) list[Union[src.features.smarterdb.SampleGoat, src.features.smarterdb.SampleSheep], bool][source]

Get or create a sample providing attributes (search for original_id in provided dataset

Parameters
  • SampleSpecies (Union[SampleGoat, SampleSheep]) – the class required for insert/update.

  • original_id (str) – the original_id in the dataset.

  • dataset (Dataset) – the dataset instance used to register sample.

  • type (str) – sample type. “background” or “foreground” are the only values accepted

  • breed (Breed) – a Breed instance.

  • country (str) – the country where the sample comes from.

  • species (str, optional) – The sample species. If None, the default species_class attribute will be used

  • chip_name (str, optional) – the chip name. The default is None.

  • sex (SEX, optional) – A SEX instance. The default is None.

  • alias (str, optional) – an original_id alias. Could be the name used in the genotype file, which could be different from the original_id. The default is None.

Raises

SmarterDBException – Raised multiple samples are returned (should never happen).

Returns

  • Union[SampleGoat, SampleSheep] – a SampleSpecies instance.

  • created (bool) – True is sample is created.

src.features.smarterdb.get_sample_type(dataset: Dataset)[source]

test if foreground or background dataset

Parameters

dataset (Dataset) – the dataset instance used to register sample

Returns

sample type (“background” or “foreground”)

Return type

str

src.features.smarterdb.global_connection(database_name: str = 'smarter') MongoClient[source]

Establish a connection to the SMARTER database. Reads environment parameters using load_dotenv(), returns a MongoClient object.

Parameters

database_name (str, optional) – The smarter database. The default is ‘smarter’.

Returns

CLIENT – a mongoclient instance.

Return type

MongoClient