API
gendata
- class gendata.IntGenoData(genotypes: DataFrame, snps: DataFrame, samples: DataFrame)
Bases:
AbstractGenoData
A class to hold and perform basic operations on integer genotype data.
- property af: Series
Calculate the allele frequency of each SNP.
- Returns:
A series of allele frequencies indexed by rsID.
- Return type:
pd.Series
- flip_snps(*rsids: str)
Flip A1 and A2 for the selected SNPs.
- Parameters:
rsids (str) – List of rsids to flip.
- Returns:
GenoData object.
- Return type:
Type[IntGenoData]
- hwe(midp: bool) Series
Calculate the exact HWE p-values for each SNP.
- Parameters:
midp (bool) – Apply midp adjustment.
- Returns:
A series of HWE p-values indexed by rsID.
- Return type:
pd.Series
- property maf: Series
Calculate the minor allele frequency for each SNP.
- Returns:
A series of minor allele frequencies indexed by rsID.
- Return type:
pd.Series
- property min_maf: float
Find the minimim minor allele frequency.
- Returns:
The minimum minor allele frequency.
- Return type:
float
- property n_het: Series
Number of samples heterozygous.
- Returns:
Count of heterozygous samples by rsID.
- Return type:
pd.Series
- property n_hom1: Series
Number of samples homozygous in the reference allele.
- Returns:
Count of homozygous (A1) samples by rsID.
- Return type:
pd.Series
- property n_hom2: Series
Number of samples homozygous in the alternate allele.
- Returns:
Count of homozygous (A2) samples by rsID.
- Return type:
pd.Series
- save(out: str)
Save the integer genotype data to a .bed/.bim/.fam fileset.
- Parameters:
out (str) – The path to save the data to. The path should not include the file extension, which will be added automatically.
- standardised()
Standardise genotypes.
- Returns:
A standardised genotype data object.
- Return type:
Type[StdGenoData]
- standardized()
Standardise genotypes.
- Returns:
A standardised genotype data object.
- Return type:
Type[StdGenoData]
- static to_maf(af: float) float
Convert a biallelic allele frequency to a minor allele frequency.
- Parameters:
af (float) – Allele frequency.
- Raises:
ValueError – If af is not in [0, 1].
- Returns:
Minor allele frequency.
- Return type:
float
- class gendata.StdGenoData(genotypes: DataFrame, snps: DataFrame, samples: DataFrame)
Bases:
AbstractGenoData
A class to hold and perform operations on standardised genotype data.
- calculate_grm(individuals: list[str] | None = None, weights: dict[str, float] | None = None) GRM
Calculate a full GRM.
- Parameters:
individuals (Optional[list[str]]) – List of individual IDs to include in the GRM.
weights (Optional[dict[str, float]]) – Weights to apply to the GRM.
- Returns:
The GRM.
- Return type:
GRM
- calculate_ldm_blocks(block_map: Series | dict[str, int], n_cores: int | None = None) dict[int, dict[str, ndarray | DataFrame]]
Calculate a block-wise LD matrix, where the each block ends at one of the listed terminal rsids.
- Parameters:
block_map (Union[pd.Series, dict[str, int]]) – Mapping of rsids to block numbers.
n_cores (Optional[int]) – Number of cores to use for parallelisation. If None, defaults to the number of cores available.
- Returns:
A dictionary containing an LD matrix dictionary for each block.
- Return type:
dict[int, dict[str, Union[np.ndarray, pd.DataFrame]]]
- calculate_ldm_window(window: int | None = None, n_cores: int | None = None, sparse: bool = True, tol: float = 0.001) dict[int, csr_matrix]
Calculate a sparse windowed LD matrix in CSR format.
Note: These calculations are quite fast, so parallelisation may not always be necessary.
- Parameters:
window (Optional[int]) – Set LD correlations to zero for all values separated by a distance of greater than window. If None, the window will be set to the maximum distance between SNPs.
n_cores (Optional[int]) – Number of cores to use for parallelisation. If None, defaults to the number of cores available.
sparse (bool) – Whether to make sparse matrices.
tol (float) – Tolerance for sparse matrix construction.
- Returns:
Dictionary containing a sparse LD matrix for each chromosome.
- Return type:
dict[int, sp.csr_matrix]
- flip_snps(*rsids: str)
Flip A1 and A2 for the selected SNPs.
- Parameters:
rsids (str) – List of rsids to flip.
- Returns:
StdGenoData object.
- Return type:
Type[StdGenoData]
- gendata.merge(*genotype_data: Type[AbstractGenoData]) Type[AbstractGenoData]
Merge sets of genotype data for different chromosomes/loci from the same set of samples.
- Parameters:
genotype_data (AbstractGenoData) – Genetic data objects to merge.
- Raises:
TypeError – If genetic data inputs are not all of the same object type.
ValueError – The set of SNPs overlaps across genetic data inputs.
ValueError – The set of samples is not the same across all genetic data
- Returns:
Merged genetic data.
- gendata.read_bed(paths: str | list[str], rsids: list | None = None, individuals: list | None = None, num_threads: int | None = 1) IntGenoData
Read raw genotypes into an annotated data frame.
Can take a direct path to a .bed/.bim/.fam file or a list of paths to a set of .bed/.bim/.fam files.
- Parameters:
paths (Union[str, list[str]]) – Paths to files containing paths to .bed/.bim/.fam filesets to load together. Can be a single path or a list of paths. If a multiple paths are given, the data will be merged after loading.
rsids (Optional[list], optional) – Filter SNPs to this set of rsIDs. If not provided, no filtering will occur.
individuals (Optional[list], optional) – Filters samples to this list of individuals. If not provided, no filtering will occur.
num_threads (Optional[int], optional) – Specifies the number of threads to use when reading bed files. Defaults to 1.
- Returns:
Annotated genotype data object.
- Return type:
core
A module containing functions and classes to facilitate the importing of genetic data. |
cov
Module containing functions for calculating covariances and covariance matrices. |
grm
Module to hold classes and functions related to genetic relationship matrices. |