src.features.plinkio

AffyPlinkIO

class src.features.plinkio.AffyPlinkIO(prefix: Optional[str] = None, mapfile: Optional[str] = None, pedfile: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]

Bases: FakePedMixin, TextPlinkIO

a new class for affymetrix plink files, which are slightly different from plink text files

get_samples() list[source]

Get samples from genotype files

Returns

The sample list.

Return type

list

read_pedfile(breed: Optional[str] = None, dataset: Optional[Dataset] = None, *args, **kwargs)[source]

Open pedfile for reading return iterator

breedstr, optional

A breed to be assigned to all samples, or use the sample breed stored in database if not provided. The default is None.

datasetDataset, optional

A dataset in which search for sample breed identifier

Yields

line (list) – A ped line read as a list.

AffyReportIO

class src.features.plinkio.AffyReportIO(report: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]

Bases: FakePedMixin, SmarterMixin

In this type of file there are both genotypes and informations. Moreover genotypes are transposed, traking SNP for all samples in a simple line

__init__(report: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
delimiter = '\t'
fetch_coordinates(src_assembly: AssemblyConf, dst_assembly: Optional[AssemblyConf] = None, search_field: str = 'name', chip_name: Optional[str] = None, skip_check: bool = False)[source]

Search for variants in smarter database. Check if the provided A/B information is equal to the database content

Parameters
  • src_assembly (AssemblyConf) – the source data assembly version

  • dst_assembly (AssemblyConf) – the destination data assembly version

  • search_field (str) – search variant by field (def. “name”)

  • chip_name (str) – limit search to this chip_name

  • skip_check (bool) – skipp coordinate check

get_samples() list[source]

Get samples from genotype files

Returns

The sample list.

Return type

list

header = []
n_samples = None
peddata = []
read_peddata(breed: Optional[str] = None, dataset: Optional[Dataset] = None, sample_field: str = 'original_id', *args, **kwargs)[source]

Yields over genotype record.

Parameters
  • breed (str, optional) – A breed to be assigned to all samples, or use the sample breed stored in database if not provided. The default is None.

  • dataset (Dataset, optional) – A dataset in which search for sample breed identifier

  • sample_field (str, optional) – Search samples using this field. The default is “original_id”.

Yields

line (list) – A ped line read as a list.

read_reportfile(n_samples: Optional[int] = None, *args, **kwargs)[source]

Read reportfile once and generate mapdata and pedata, with genotype informations by sample.

Parameters

n_samples (int, optional) – Limit to N samples. Useful when there are different number of samples from reported in file. The default is None (read number of samples from reportfile).

Return type

None.

report = None
warn_missing_cols = True

BinaryPlinkIO

class src.features.plinkio.BinaryPlinkIO(prefix: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]

Bases: SmarterMixin

__init__(prefix: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
get_samples() list[source]

Get samples from genotype files

Returns

The sample list.

Return type

list

property prefix
read_mapfile()[source]

Read map data and track informations in memory. Useful to process data files

read_pedfile(*args, **kwargs)[source]

Open pedfile for reading return iterator

MapRecord

class src.features.plinkio.MapRecord(chrom: str, name: str, cm: float, position: int)[source]

Bases: object

__init__(chrom: str, name: str, cm: float, position: int) None
chrom: str
cm: float
name: str
position: int

FakePedMixin

class src.features.plinkio.FakePedMixin[source]

Bases: object

Class which override SmarterMixin when creating a PED file from a non-plink file format. In this case the FID is already correct and I don’t need to look for dataset aliases

search_breed(fid, *args, **kwargs)[source]

Get breed relying on provided FID and species class attribute

search_fid(sample_name: str, dataset: Dataset, sample_field: str = 'original_id') str[source]

Determine FID from smarter SampleSpecies breed

Parameters
  • sample_name (str) – The sample name.

  • dataset (Dataset) – The dataset where the sample comes from.

  • sample_field (str) – The field use to search sample name

Returns

fid – The FID used in the generated .ped file

Return type

str

IlluminaReportIO

class src.features.plinkio.IlluminaReportIO(snpfile: Optional[str] = None, report: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]

Bases: FakePedMixin, SmarterMixin

__init__(snpfile: Optional[str] = None, report: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
get_samples() list[source]

Get samples from genotype files

Returns

The sample list.

Return type

list

read_reportfile(breed: Optional[str] = None, dataset: Optional[Dataset] = None, *args, **kwargs)[source]

Open and read an illumina report file. Returns iterator

Parameters
  • breed (str, optional) – A breed to be assigned to all samples, or use the sample breed stored in database if not provided. The default is None.

  • dataset (Dataset, optional) – A dataset in which search for sample breed identifier

Raises

IlluminaReportException – Raised when SNPs index doesn’t match snpfile.

Yields

line (list) – A ped line read as a list.

read_snpfile()[source]

Read snp data and track informations in memory. Useful to process data files

report = None
snpfile = None

SmarterMixin

class src.features.plinkio.SmarterMixin[source]

Bases: object

Common features of a Smarter related dataset file

SampleSpecies = None
VariantSpecies = None
chip_name = None
dst_locations = []
fetch_coordinates(src_assembly: AssemblyConf, dst_assembly: Optional[AssemblyConf] = None, search_field: str = 'name', chip_name: Optional[str] = None, *args, **kwargs)[source]

Search for variants in smarter database

Parameters
  • src_assembly (AssemblyConf) – the source data assembly version

  • dst_assembly (AssemblyConf) – the destination data assembly version

  • search_field (str) – search variant by field (def. “name”)

  • chip_name (str) – limit search to this chip_name

fetch_coordinates_by_positions(src_assembly: AssemblyConf, dst_assembly: Optional[AssemblyConf] = None)[source]

Search for variant in smarter database relying on positions

Parameters
  • src_assembly (AssemblyConf) – the source data assembly version.

  • dst_assembly (AssemblyConf, optional) – the destination data assembly version. The default is None.

Return type

None.

filtered = {}
get_or_create_sample(line: list, dataset: Dataset, breed: Breed, sample_field: str = 'original_id', create_sample: bool = False) Union[SampleSheep, SampleGoat][source]

Get a sample from database or create a new one (if create_sample parameter flag is provided)

Parameters
  • line (list) – A ped line as a list.

  • dataset (Dataset) – The dataset object this sample belongs to.

  • breed (Breed) – The Breed object of such sample.

  • sample_field (str, optional) – Search sample name within this field. The default is “original_id”.

  • create_sample (bool, optional) – Create a sample if not found in database. The default is False.

Raises

SmarterDBException – Raised if more than one sample is retrieved.

Returns

sample – A SampleSheep or SampleGoat object for a Sample object found or created. None if no sample is found and create_sample if False.

Return type

Union[SampleSheep, SampleGoat]

make_query_args(src_assembly: AssemblyConf, dst_assembly: AssemblyConf)[source]

Generate args to select variants from database

make_query_kwargs(search_field: str, record: MapRecord, chip_name: str)[source]

Generate kwargs to select variants from database

mapdata = []
read_genotype_method = None
search_breed(fid, dataset, *args, **kwargs)[source]

Get breed relying aliases and dataset

search_country(dataset: Dataset, breed: Breed)[source]
skip_index(idx)[source]

Skip a certain SNP reling on its position

property species
src_locations = []
update_mapfile(outputfile: str)[source]
update_pedfile(outputfile: str, dataset: Dataset, coding: str, create_samples: bool = False, sample_field: str = 'original_id', ignore_coding_errors: bool = False, *args, **kwargs)[source]

Write a new pedfile relying on illumina_top genotypes and coordinates stored in smarter database

Parameters
  • outputfile (str) – write ped to this path (overwrite if exists)

  • dataset (Dataset) – the dataset we are converting

  • coding (str) – the source coding (could be ‘top’, ‘ab’, ‘forward’)

  • create_samples (bool) – create samples if not exist (useful to create samples directly from ped file)

  • sample_field (str) – search samples using this attribute (def. ‘original_id’)

  • ignore_coding_errors (bool) – ignore coding related errors (no more exceptions when genotypes don’t match)

variants_name = []

TextPlinkIO

class src.features.plinkio.TextPlinkIO(prefix: Optional[str] = None, mapfile: Optional[str] = None, pedfile: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]

Bases: SmarterMixin

__init__(prefix: Optional[str] = None, mapfile: Optional[str] = None, pedfile: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
get_samples() list[source]

Get samples from genotype files

Returns

The sample list.

Return type

list

mapfile = None
pedfile = None
read_mapfile()[source]

Read map data and track informations in memory. Useful to process data files

read_pedfile(*args, **kwargs)[source]

Open pedfile for reading return iterator