src.features.plinkio
Table of Contents
AffyPlinkIO
- class src.features.plinkio.AffyPlinkIO(prefix: Optional[str] = None, mapfile: Optional[str] = None, pedfile: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
Bases:
FakePedMixin
,TextPlinkIO
a new class for affymetrix plink files, which are slightly different from plink text files
- read_pedfile(breed: Optional[str] = None, dataset: Optional[Dataset] = None, *args, **kwargs)[source]
Open pedfile for reading return iterator
- breedstr, optional
A breed to be assigned to all samples, or use the sample breed stored in database if not provided. The default is None.
- datasetDataset, optional
A dataset in which search for sample breed identifier
- Yields
line (list) – A ped line read as a list.
AffyReportIO
- class src.features.plinkio.AffyReportIO(report: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
Bases:
FakePedMixin
,SmarterMixin
In this type of file there are both genotypes and informations. Moreover genotypes are transposed, traking SNP for all samples in a simple line
- __init__(report: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
- delimiter = '\t'
- fetch_coordinates(src_assembly: AssemblyConf, dst_assembly: Optional[AssemblyConf] = None, search_field: str = 'name', chip_name: Optional[str] = None, skip_check: bool = False)[source]
Search for variants in smarter database. Check if the provided A/B information is equal to the database content
- Parameters
src_assembly (AssemblyConf) – the source data assembly version
dst_assembly (AssemblyConf) – the destination data assembly version
search_field (str) – search variant by field (def. “name”)
chip_name (str) – limit search to this chip_name
skip_check (bool) – skipp coordinate check
- header = []
- n_samples = None
- peddata = []
- read_peddata(breed: Optional[str] = None, dataset: Optional[Dataset] = None, sample_field: str = 'original_id', *args, **kwargs)[source]
Yields over genotype record.
- Parameters
breed (str, optional) – A breed to be assigned to all samples, or use the sample breed stored in database if not provided. The default is None.
dataset (Dataset, optional) – A dataset in which search for sample breed identifier
sample_field (str, optional) – Search samples using this field. The default is “original_id”.
- Yields
line (list) – A ped line read as a list.
- read_reportfile(n_samples: Optional[int] = None, *args, **kwargs)[source]
Read reportfile once and generate mapdata and pedata, with genotype informations by sample.
- Parameters
n_samples (int, optional) – Limit to N samples. Useful when there are different number of samples from reported in file. The default is None (read number of samples from reportfile).
- Return type
None.
- report = None
- warn_missing_cols = True
BinaryPlinkIO
- class src.features.plinkio.BinaryPlinkIO(prefix: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
Bases:
SmarterMixin
- __init__(prefix: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
- plink_file = None
- property prefix
FakePedMixin
- class src.features.plinkio.FakePedMixin[source]
Bases:
object
Class which override SmarterMixin when creating a PED file from a non-plink file format. In this case the FID is already correct and I don’t need to look for dataset aliases
- search_breed(fid, *args, **kwargs)[source]
Get breed relying on provided FID and species class attribute
IlluminaReportIO
- class src.features.plinkio.IlluminaReportIO(snpfile: Optional[str] = None, report: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
Bases:
FakePedMixin
,SmarterMixin
- __init__(snpfile: Optional[str] = None, report: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
- read_reportfile(breed: Optional[str] = None, dataset: Optional[Dataset] = None, *args, **kwargs)[source]
Open and read an illumina report file. Returns iterator
- Parameters
- Raises
IlluminaReportException – Raised when SNPs index doesn’t match snpfile.
- Yields
line (list) – A ped line read as a list.
- read_snpfile()[source]
Read snp data and track informations in memory. Useful to process data files
- report = None
- snpfile = None
SmarterMixin
- class src.features.plinkio.SmarterMixin[source]
Bases:
object
Common features of a Smarter related dataset file
- SampleSpecies = None
- VariantSpecies = None
- chip_name = None
- dst_locations = []
- fetch_coordinates(src_assembly: AssemblyConf, dst_assembly: Optional[AssemblyConf] = None, search_field: str = 'name', chip_name: Optional[str] = None, *args, **kwargs)[source]
Search for variants in smarter database
- Parameters
src_assembly (AssemblyConf) – the source data assembly version
dst_assembly (AssemblyConf) – the destination data assembly version
search_field (str) – search variant by field (def. “name”)
chip_name (str) – limit search to this chip_name
- fetch_coordinates_by_positions(src_assembly: AssemblyConf, dst_assembly: Optional[AssemblyConf] = None)[source]
Search for variant in smarter database relying on positions
- Parameters
src_assembly (AssemblyConf) – the source data assembly version.
dst_assembly (AssemblyConf, optional) – the destination data assembly version. The default is None.
- Return type
None.
- filtered = {}
- get_or_create_sample(line: list, dataset: Dataset, breed: Breed, sample_field: str = 'original_id', create_sample: bool = False) Union[SampleSheep, SampleGoat] [source]
Get a sample from database or create a new one (if create_sample parameter flag is provided)
- Parameters
line (list) – A ped line as a list.
dataset (Dataset) – The dataset object this sample belongs to.
breed (Breed) – The Breed object of such sample.
sample_field (str, optional) – Search sample name within this field. The default is “original_id”.
create_sample (bool, optional) – Create a sample if not found in database. The default is False.
- Raises
SmarterDBException – Raised if more than one sample is retrieved.
- Returns
sample – A SampleSheep or SampleGoat object for a Sample object found or created. None if no sample is found and create_sample if False.
- Return type
Union[SampleSheep, SampleGoat]
- make_query_args(src_assembly: AssemblyConf, dst_assembly: AssemblyConf)[source]
Generate args to select variants from database
- make_query_kwargs(search_field: str, record: MapRecord, chip_name: str)[source]
Generate kwargs to select variants from database
- mapdata = []
- read_genotype_method = None
- property species
- src_locations = []
- update_pedfile(outputfile: str, dataset: Dataset, coding: str, create_samples: bool = False, sample_field: str = 'original_id', ignore_coding_errors: bool = False, *args, **kwargs)[source]
Write a new pedfile relying on illumina_top genotypes and coordinates stored in smarter database
- Parameters
outputfile (str) – write ped to this path (overwrite if exists)
dataset (Dataset) – the dataset we are converting
coding (str) – the source coding (could be ‘top’, ‘ab’, ‘forward’)
create_samples (bool) – create samples if not exist (useful to create samples directly from ped file)
sample_field (str) – search samples using this attribute (def. ‘original_id’)
ignore_coding_errors (bool) – ignore coding related errors (no more exceptions when genotypes don’t match)
- variants_name = []
TextPlinkIO
- class src.features.plinkio.TextPlinkIO(prefix: Optional[str] = None, mapfile: Optional[str] = None, pedfile: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
Bases:
SmarterMixin
- __init__(prefix: Optional[str] = None, mapfile: Optional[str] = None, pedfile: Optional[str] = None, species: Optional[str] = None, chip_name: Optional[str] = None)[source]
- mapfile = None
- pedfile = None
plink_binary_exists
- plinkio.plink_binary_exists()
Test if plink binary files exists