draw_all_unstructured
, to facilitate drawing unstructured data (where individuals have genotypes drawn independently and identically distributed) with fancy features such as the various ways to define ancestral allele frequences and prevent fixed loci and potentially even rare variants. Applicable options follow the model of the previous function draw_all_admix
.scale_tree
rewrote so scaling is on linear scale as the objective (so its coancestry and FST are scaled by the desired factor), and (non-additive) IBD edges are transformed non-linearly (as is correct to agree scaled additive edges).
tree_additive
), which resulted in a bug when both types of edges were present. This bug has always been present since the introduction of this function in version 1.3.1.9000 (2021-04-17).tree_additive
added option validate
.
TRUE
value has original behavior.tree_additive
itself is used inside the internal validation function validate_coanc_tree
to make sure that IBD and additive edges agree with each other when both are present, so validation must not be repeated lest we enter an infinite recursion, which results in a stack overflow.bias_coeff_admix_fit
) shared by admix_prop_1d_linear
and admix_prop_1d_circular
for edge cases.
Inf
, but instead an error was encountered.stats::uniroot
)undiff_af
renamed parameter F
to kinship_mean
, and updated all documentation to reflect the correction that this parameter is the mean kinship and not FST (the complete derivation will appear in a manuscript).
F_max
is similarly now kinship_mean_max
.draw_all_admix
added option p_anc_distr
for passing custom ancestral allele frequency distributions (as vector or function).
This differs from the similar preexisting option p_anc
, which fixed ancestral allele frequencies per locus to those values.
These two options behave differently when loci have to be re-drawn due to being fixed or having too-low MAFs: passing p_anc
never changes those values, whereas passing p_anc_distr
results in drawing new values as necessary.
The new option is more natural biologically and results in re-drawing fixed loci less often.undiff_af
:
F_max
, V_in
, V_out
, V_mix
, and alpha
.distr = "auto"
cases where mixing variance ended up being smaller than required due to roundoff errors (alpha
is now larger than given in direct formula by eps = 10 * .Machine$double.eps
, which is also a new option.undiff_af
for creating "undifferentiated" allele frequency distributions based on real data but with a lower variance (more concentrated around 0.5) according to a given FST, useful for simulating data trying to match real data.LICENSE.md
.NEWS.md
slightly to improve its automatic parsing.inst/CITATION
(missed last time I updated them in other locations).fit_tree
internally simplified to use stats::hclust
, which also results in a small runtime gain.
The new code (when method = "mcquitty"
, which is default) gives the same answers as before (in other words, the original algorithm was a special case of hierarchical clustering).
method
is passed to hclust
.
Although all hclust
methods are allowed, for this application the only ones that make sense are "mcquitty" (WPGMA) and "average" (UPGMA).
In internal evaluations, both algorithms had similar accuracy and runtime, but only "mcquitty" exactly recapitulates the original algorithm.draw_all_admix
that could cause a "stack overflow" error.
The function used to call itself recursively if require_polymorphic_loci = TRUE
, and in cases where there are very rare allele frequencies or high maf_min
the number of recursions could be so large that it triggered this error.
Now the function has a while
loop, and does not recurse more than one level at the time; there is no limit to the number of iterations and no errors occur inherently due to large numbers of iterations.fixed_loci
and draw_all_admix
have a new parameter maf_min
that, when greater than zero, allows for treating rare variants as fixed.
In draw_all_admix
, this now allows for simulating loci with frequency-based ascertainment bias.New functions and bug fixes dealing with reordering tree edges and tips.
tree_reindex_tips
for ensuring that tip order agrees in both the internal labels vector and the edge matrix.
Such lack of agreement is generally possible (technically the tree is the same for arbitrary orders of edges in the edge matrix).
However, such a disagreement causes visual disagreement in plots (for example, trees are plotted in the order of the edge matrix, versus coancestry matrices are ordered as in the tip labels vector instead), which can now be fixed in general.tree_reorder
for reordering tree edges and tips to agree as much as possible with a desired tip order.
The heuristic finds the exact solution if it exists, otherwise returns a reasonable order close to the desired order.
Tip order in labels and edge matrix agree (via tree_reindex_tips
).fit_tree
now outputs trees with tip order that better agrees with the input data, and tip order in labels vector and edge matrix now agree (via tree_reorder
).tree_additive
.
Before this bug fix, some trees could trigger the error message "Error: Node index 6 was not assigned coancestry from root! (unexpected)", where "6" could be other numbers.draw_p_subpops_tree
.
Before this bug fix, some trees could trigger the error message "Error: The root node index in tree_subpops$edge
(9) does not match k_subpops + 1
(6) where k_subpops
is the number of tips! Is the tree_subpops
object malformed?", where "9" and "6" could be other numbers. Other possible error messages contain "Parent node index 6 has not yet been processed ..." or "Child" instead of "Parent", where "6" could be other numbers.fit_tree
had related fixes, but overall fit_tree
appears to have had no bugs because users cannot provide trees, and the tree-building algorithm does not produce scrambled edges that would have caused problems.admix_prop_1d_linear
, admix_prop_1d_circular
now copy names from the input coanc_subpops
(vector and matrix versions, only required when fitting bias_coeff
) to the columns of the output admix_proportions
matrix.draw_genotypes_admix
now copies row and column names from input matrix p_ind
(or rownames from p_ind
and column names from the rownames of admix_proportions
when the latter is provided) to output genotype matrixdraw_p_subpops
now copies names from p_anc
to rows, names from inbr_subpops
to columns, when present and of the right dimensions.draw_p_subpops_tree
now copies names from p_anc
to rows. Names from tree_subpops
were already copied to columns before.coanc_admix
and fst_admix
stop if the column names of admix_proportions
and the names of coanc_subpops
disagree.draw_all_admix
stops if the column names of admix_proportions
and the names of either inbr_subpops
or tree_subpops
disagree.draw_genotypes_admix
, when admix_proportions
is passed, stops if the column names of admix_proportions
and p_ind
disagree.make_p_ind_admix
stops if the column names of admix_proportions
and p_subpops
disagree.tree_additive
now has option force
, which when TRUE
simply proceeds without stopping if additive edges were already present (in tree$edge.length.add
, which is ignored and overwritten).It's Fangorn Forest around here with all the tree updates!
fit_tree
for fitting trees to coancestry matrices!scale_tree
to easily scale coancestry trees and check for out-of-bounds values.tree_additive
for calculating "additive" edges for probabilistic edge coancestry trees, and also the reverse function .
coanc_tree
, but now it's renamed, exported, and well documented!$root.edge
to tree phylo
objects passed to these functions:
coanc_tree
: edge is a shared covariance value affecting all subpopulations.draw_all_admix
and draw_p_subpops_tree
: if root edge is present, functions warn that it will be ignored.admix_prop_1d_linear
and admix_prop_1d_circular
: debugged an edge case where sigma
is small but not zero and numerically-calculated densities all come out to zero in a given row of the admix_proportions
matrix (for admix_prop_1d_circular
infinite values also arise), which used to lead to NAs upon row normalization; now for those rows, the closest ancestry (by coordinate distance) gets assigned the full admixture fraction (just as for independent subpopulations/sigma = 0
).draw_p_subpops_tree
is the tree version of draw_p_subpops
.coanc_tree
calculates the true coancestry matrix corresponding to the subpopulations related by a tree.draw_all_admix
has new argument tree_subpops
that can be used in place of inbr_subpops
(to simulated subpopulation allele frequencies using draw_p_subpops_tree
instead of draw_p_subpops
).coanc_subpops
) as input, so they work if they are passed the matrix that coanc_tree
returns: coanc_admix
, fst_admix
, admix_prop_1d_linear
, admix_prop_1d_circular
.admix_prop_1d_linear
and admix_prop_1d_circular
, when sigma
is missing (and therefore fit to a desired coanc_subpops
, fst
, and bias_coeff
), now additionally return multiplicative factor
used to rescale coanc_subpops
.admix_prop_1d_linear
and admix_prop_1d_circular
had these changes:
bias_coeff
, coanc_subpops
and fst
now have default values (of NA
, NULL
, and NA
, respectively) instead of missing, and these "missing" values can be passed to get the same behavior as if they hadn't been passed at all.bias_coeff = 1
(to fix an issue only observed on Apple M1).admix_prop_indep_subpops
: default value for the optional parameter subpops
is now made more clear in arguments definition.DESCRIPTION
, README.md
and the vignette, to point to the published method in PLoS Genetics.draw_all_admix
: when option p_anc
is provided as scalar and want_p_anc = TRUE
, now the return value is always a vector (in this case the input scalar value repeated m_loci
times). The previous behavior was to return p_anc
as scalar if that was the input, which could be problematic for downstream applications.p_anc
to function draw_all_admix
, to specify desired ancestral allele frequencies instead of having the code generate it randomly (default).draw_p_subpops.R
, clarifying that input p_anc
can be scalar.bias_coeff_admix_fit
, which caused it to die if the desired bias coefficient was an extreme value (particularly 1
).
The error message was: f() values at end points not of opposite sign
.
The actual bug was not observed in the regular R build, but rather in a limited precision setting where R was configured with --disable-long-double
.q1dc
, q1d
, qis
, coanc
, rbnpsd
, rgeno
, rpanc
, rpint
, rpiaf
.man/figures/
beta
in function draw_p_anc
to trigger a symmetric Beta distribution for the ancestral allele frequencies, with the desired shape parameter.
The beta
option can also be set on the wrapper function draw_all_admix
.
This option allows simulation of a distribution heavier on rare variants (when beta
is much smaller than 1), more similar to real human data.draw_genotypes_admix
low_mem
option could be set but filled slowly by locus only.draw_all_admix
is also now automatically low-memory whenever want_p_ind = FALSE
, and the explicit low_mem
option has also been removed.admix_prop_1d_circular
) to prevent overlapping individuals on the edges, and to better agree visually with the linear version (admix_prop_1d_linear
).coanc
-> coanc_admix
q1d
-> admix_prop_1d_linear
q1dc
-> admix_prop_1d_circular
qis
-> admix_prop_indep_subpops
rpanc
-> draw_p_anc
rpint
-> draw_p_subpops
rpiaf
-> make_p_ind_admix
rgeno
-> draw_genotypes_admix
rbnpsd
-> draw_all_admix
fst
-> fst_admix
(no deprecated version available in this case, to eliminate conflict with popkin::fst
)Q
-> admix_proportions
F
-> coanc_subpops
(if general matrix is accepted), inbr_subpops
(vector or scalar versions required)s
-> bias_coeff
w
-> weights
Theta
-> coancestry
m
-> m_loci
n
-> n_ind
k
-> k_subpops
pAnc
-> p_anc
B
-> p_subpops
P
-> p_ind
sigma = 0
bug in admix_prop_1d_circular
.draw_all_admix
(compared to deprecated rbnpsd
, which retains old defaults):
require_polymorphic_loci
(old noFixed
) is now TRUE
by default.want_p_ind
and want_p_subpops
(old wantP
and wantB
) are now FALSE
by default.draw_p_subpops
now admits scalar inputs p_anc
and inbr_subpops
, while number of loci and number of subpopulations can be provided as additional options.coanc_subpops
) or if the diagonal matrix case is required (specified as vector or scalar inbr_subpops
).qis
now returns a numeric admixture proportions matrix (used to be logical).q1d
and q1dc
now handle sigma = 0
special case.q1d
and q1dc
now provide more informative out-of-bounds messages when sigma
is missing (and s
is provided)sigma
root finding in q1d
and q1dc
(when s
is provided) is now more robust, explicitly tested at boundaries (min s > 0
achieved at sigma = 0
and max s = 1
achieved at sigma = Inf
).
interval
and tol
from both q1d
and q1dc
(users would never need to set them now that procedure is more robust).coanc_to_kinship
to easily obtain kinship matrices from coancestry matrices.Added option noFixed
to function rbnpsd
to redraw loci that were drawn fixed for a single allele.
These loci are not polymorphic so they would normally not be considered in analyses.
Added function fixed_loci
to test for fixed loci within rbnpsd.