draw_all_unstructured, to facilitate drawing unstructured data (where individuals have genotypes drawn independently and identically distributed) with fancy features such as the various ways to define ancestral allele frequences and prevent fixed loci and potentially even rare variants. Applicable options follow the model of the previous function draw_all_admix.scale_tree rewrote so scaling is on linear scale as the objective (so its coancestry and FST are scaled by the desired factor), and (non-additive) IBD edges are transformed non-linearly (as is correct to agree scaled additive edges).
tree_additive), which resulted in a bug when both types of edges were present. This bug has always been present since the introduction of this function in version 1.3.1.9000 (2021-04-17).tree_additive added option validate.
TRUE value has original behavior.tree_additive itself is used inside the internal validation function validate_coanc_tree to make sure that IBD and additive edges agree with each other when both are present, so validation must not be repeated lest we enter an infinite recursion, which results in a stack overflow.bias_coeff_admix_fit) shared by admix_prop_1d_linear and admix_prop_1d_circular for edge cases.
Inf, but instead an error was encountered.stats::uniroot)undiff_af renamed parameter F to kinship_mean, and updated all documentation to reflect the correction that this parameter is the mean kinship and not FST (the complete derivation will appear in a manuscript).
F_max is similarly now kinship_mean_max.draw_all_admix added option p_anc_distr for passing custom ancestral allele frequency distributions (as vector or function).
This differs from the similar preexisting option p_anc, which fixed ancestral allele frequencies per locus to those values.
These two options behave differently when loci have to be re-drawn due to being fixed or having too-low MAFs: passing p_anc never changes those values, whereas passing p_anc_distr results in drawing new values as necessary.
The new option is more natural biologically and results in re-drawing fixed loci less often.undiff_af:
F_max, V_in, V_out, V_mix, and alpha.distr = "auto" cases where mixing variance ended up being smaller than required due to roundoff errors (alpha is now larger than given in direct formula by eps = 10 * .Machine$double.eps, which is also a new option.undiff_af for creating "undifferentiated" allele frequency distributions based on real data but with a lower variance (more concentrated around 0.5) according to a given FST, useful for simulating data trying to match real data.LICENSE.md.NEWS.md slightly to improve its automatic parsing.inst/CITATION (missed last time I updated them in other locations).fit_tree internally simplified to use stats::hclust, which also results in a small runtime gain.
The new code (when method = "mcquitty", which is default) gives the same answers as before (in other words, the original algorithm was a special case of hierarchical clustering).
method is passed to hclust.
Although all hclust methods are allowed, for this application the only ones that make sense are "mcquitty" (WPGMA) and "average" (UPGMA).
In internal evaluations, both algorithms had similar accuracy and runtime, but only "mcquitty" exactly recapitulates the original algorithm.draw_all_admix that could cause a "stack overflow" error.
The function used to call itself recursively if require_polymorphic_loci = TRUE, and in cases where there are very rare allele frequencies or high maf_min the number of recursions could be so large that it triggered this error.
Now the function has a while loop, and does not recurse more than one level at the time; there is no limit to the number of iterations and no errors occur inherently due to large numbers of iterations.fixed_loci and draw_all_admix have a new parameter maf_min that, when greater than zero, allows for treating rare variants as fixed.
In draw_all_admix, this now allows for simulating loci with frequency-based ascertainment bias.New functions and bug fixes dealing with reordering tree edges and tips.
tree_reindex_tips for ensuring that tip order agrees in both the internal labels vector and the edge matrix.
Such lack of agreement is generally possible (technically the tree is the same for arbitrary orders of edges in the edge matrix).
However, such a disagreement causes visual disagreement in plots (for example, trees are plotted in the order of the edge matrix, versus coancestry matrices are ordered as in the tip labels vector instead), which can now be fixed in general.tree_reorder for reordering tree edges and tips to agree as much as possible with a desired tip order.
The heuristic finds the exact solution if it exists, otherwise returns a reasonable order close to the desired order.
Tip order in labels and edge matrix agree (via tree_reindex_tips).fit_tree now outputs trees with tip order that better agrees with the input data, and tip order in labels vector and edge matrix now agree (via tree_reorder).tree_additive.
Before this bug fix, some trees could trigger the error message "Error: Node index 6 was not assigned coancestry from root! (unexpected)", where "6" could be other numbers.draw_p_subpops_tree.
Before this bug fix, some trees could trigger the error message "Error: The root node index in tree_subpops$edge (9) does not match k_subpops + 1 (6) where k_subpops is the number of tips! Is the tree_subpops object malformed?", where "9" and "6" could be other numbers. Other possible error messages contain "Parent node index 6 has not yet been processed ..." or "Child" instead of "Parent", where "6" could be other numbers.fit_tree had related fixes, but overall fit_tree appears to have had no bugs because users cannot provide trees, and the tree-building algorithm does not produce scrambled edges that would have caused problems.admix_prop_1d_linear, admix_prop_1d_circular now copy names from the input coanc_subpops (vector and matrix versions, only required when fitting bias_coeff) to the columns of the output admix_proportions matrix.draw_genotypes_admix now copies row and column names from input matrix p_ind (or rownames from p_ind and column names from the rownames of admix_proportions when the latter is provided) to output genotype matrixdraw_p_subpops now copies names from p_anc to rows, names from inbr_subpops to columns, when present and of the right dimensions.draw_p_subpops_tree now copies names from p_anc to rows. Names from tree_subpops were already copied to columns before.coanc_admix and fst_admix stop if the column names of admix_proportions and the names of coanc_subpops disagree.draw_all_admix stops if the column names of admix_proportions and the names of either inbr_subpops or tree_subpops disagree.draw_genotypes_admix, when admix_proportions is passed, stops if the column names of admix_proportions and p_ind disagree.make_p_ind_admix stops if the column names of admix_proportions and p_subpops disagree.tree_additive now has option force, which when TRUE simply proceeds without stopping if additive edges were already present (in tree$edge.length.add, which is ignored and overwritten).It's Fangorn Forest around here with all the tree updates!
fit_tree for fitting trees to coancestry matrices!scale_tree to easily scale coancestry trees and check for out-of-bounds values.tree_additive for calculating "additive" edges for probabilistic edge coancestry trees, and also the reverse function .
coanc_tree, but now it's renamed, exported, and well documented!$root.edge to tree phylo objects passed to these functions:
coanc_tree: edge is a shared covariance value affecting all subpopulations.draw_all_admix and draw_p_subpops_tree: if root edge is present, functions warn that it will be ignored.admix_prop_1d_linear and admix_prop_1d_circular: debugged an edge case where sigma is small but not zero and numerically-calculated densities all come out to zero in a given row of the admix_proportions matrix (for admix_prop_1d_circular infinite values also arise), which used to lead to NAs upon row normalization; now for those rows, the closest ancestry (by coordinate distance) gets assigned the full admixture fraction (just as for independent subpopulations/sigma = 0).draw_p_subpops_tree is the tree version of draw_p_subpops.coanc_tree calculates the true coancestry matrix corresponding to the subpopulations related by a tree.draw_all_admix has new argument tree_subpops that can be used in place of inbr_subpops (to simulated subpopulation allele frequencies using draw_p_subpops_tree instead of draw_p_subpops).coanc_subpops) as input, so they work if they are passed the matrix that coanc_tree returns: coanc_admix, fst_admix, admix_prop_1d_linear, admix_prop_1d_circular.admix_prop_1d_linear and admix_prop_1d_circular, when sigma is missing (and therefore fit to a desired coanc_subpops, fst, and bias_coeff), now additionally return multiplicative factor used to rescale coanc_subpops.admix_prop_1d_linear and admix_prop_1d_circular had these changes:
bias_coeff, coanc_subpops and fst now have default values (of NA, NULL, and NA, respectively) instead of missing, and these "missing" values can be passed to get the same behavior as if they hadn't been passed at all.bias_coeff = 1 (to fix an issue only observed on Apple M1).admix_prop_indep_subpops: default value for the optional parameter subpops is now made more clear in arguments definition.DESCRIPTION, README.md and the vignette, to point to the published method in PLoS Genetics.draw_all_admix: when option p_anc is provided as scalar and want_p_anc = TRUE, now the return value is always a vector (in this case the input scalar value repeated m_loci times). The previous behavior was to return p_anc as scalar if that was the input, which could be problematic for downstream applications.p_anc to function draw_all_admix, to specify desired ancestral allele frequencies instead of having the code generate it randomly (default).draw_p_subpops.R, clarifying that input p_anc can be scalar.bias_coeff_admix_fit, which caused it to die if the desired bias coefficient was an extreme value (particularly 1).
The error message was: f() values at end points not of opposite sign.
The actual bug was not observed in the regular R build, but rather in a limited precision setting where R was configured with --disable-long-double.q1dc, q1d, qis, coanc, rbnpsd, rgeno, rpanc, rpint, rpiaf.man/figures/beta in function draw_p_anc to trigger a symmetric Beta distribution for the ancestral allele frequencies, with the desired shape parameter.
The beta option can also be set on the wrapper function draw_all_admix.
This option allows simulation of a distribution heavier on rare variants (when beta is much smaller than 1), more similar to real human data.draw_genotypes_admix
low_mem option could be set but filled slowly by locus only.draw_all_admix is also now automatically low-memory whenever want_p_ind = FALSE, and the explicit low_mem option has also been removed.admix_prop_1d_circular) to prevent overlapping individuals on the edges, and to better agree visually with the linear version (admix_prop_1d_linear).coanc -> coanc_admixq1d -> admix_prop_1d_linearq1dc -> admix_prop_1d_circularqis -> admix_prop_indep_subpopsrpanc -> draw_p_ancrpint -> draw_p_subpopsrpiaf -> make_p_ind_admixrgeno -> draw_genotypes_admixrbnpsd -> draw_all_admixfst -> fst_admix (no deprecated version available in this case, to eliminate conflict with popkin::fst)Q -> admix_proportionsF -> coanc_subpops (if general matrix is accepted), inbr_subpops (vector or scalar versions required)s -> bias_coeffw -> weightsTheta -> coancestrym -> m_locin -> n_indk -> k_subpopspAnc -> p_ancB -> p_subpopsP -> p_indsigma = 0 bug in admix_prop_1d_circular.draw_all_admix (compared to deprecated rbnpsd, which retains old defaults):
require_polymorphic_loci (old noFixed) is now TRUE by default.want_p_ind and want_p_subpops (old wantP and wantB) are now FALSE by default.draw_p_subpops now admits scalar inputs p_anc and inbr_subpops, while number of loci and number of subpopulations can be provided as additional options.coanc_subpops) or if the diagonal matrix case is required (specified as vector or scalar inbr_subpops).qis now returns a numeric admixture proportions matrix (used to be logical).q1d and q1dc now handle sigma = 0 special case.q1d and q1dc now provide more informative out-of-bounds messages when sigma is missing (and s is provided)sigma root finding in q1d and q1dc (when s is provided) is now more robust, explicitly tested at boundaries (min s > 0 achieved at sigma = 0 and max s = 1 achieved at sigma = Inf).
interval and tol from both q1d and q1dc (users would never need to set them now that procedure is more robust).coanc_to_kinship to easily obtain kinship matrices from coancestry matrices.Added option noFixed to function rbnpsd to redraw loci that were drawn fixed for a single allele.
These loci are not polymorphic so they would normally not be considered in analyses.
Added function fixed_loci to test for fixed loci within rbnpsd.