plot_popkin added options xlab, xlab_adj, xlab_line, and xlab_side.popkin and popkin_af when want_M = TRUE now include A_min in their return list.plot_popkin and plot_admix added option labs_even_line to allow location of the end of labs_even lines to differ from labs_line.hgdp_subset sample data, copied from lfa.
lfa dependency, which was only used for this sample data.
lfa has become unreliable in external testing servers, particularly as it is on Bioconductor and sometimes hard to install on R-devel, so its removal simplifies automatic testing considerably.popkin, popkin_A, and popkin_af
plot_popkin clarified documentation.plot_phylo added option edge_width, which defaults to 1.
ape version 5.5 and prior, where its function plot.phylo (which popkin::plot_phylo wraps) had its parameter edge.width default to 1.ape version 5.6 (2021-12-20), edge.width defaults to NULL, with results in setting it to par('lwd'), which had undesirable consequences in my use cases and which is why the old default is overridden in popkin.plot_popkin's old default edge widths for trees of class phylo is also restored.plot_popkin
ylab_per_panel to allow single-panel figures to place y-axis label in inner margin (before that case was forced to use outer margin).oma and layout_add, as in some cases you may want to turn off both features to avoid unexpected behaviors (though there are cases where turning off one but not the other also makes sense).print_labels fixed bug when even = TRUE and the minimum xb_ind is not zero, which caused the maximum to be off by xb_ind.
plot_popkin or plot_admix) because the minimum xb_ind was always zero in those cases.popkin, popkin_A:
mean_of_ratios, default FALSE is original estimator, TRUE gives a new estimator that upweighs rare variants, which resembles in this way the standard kinship estimator, and which appears to improve performance in association testing.M (one of the return values when want_M = TRUE) did not inherit individual names from X even though A and kinship did, and similarly all inherit names when X is a function (fixed accidentally when replacing Rcpp code with pure R).mean_of_ratios = FALSE) replaced Rcpp code with pure R version, which results in large speedups, at a cost of higher memory use (despite my best attempts at improving the original Rcpp code, the simpler R code is doing something magically fast I don't understand).
Rcpp, RcppEigen dependencies have been dropped as a consequence.plot_admix:
admix_order_cols: to automatically order ancestries given ordered individuals.admix_label_cols: to automatically assign labels to ancestries given labels to individuals.plot_admix added options leg_title_line and leg_las, and changed the default of leg_mar, to better accommodate numerous long ancestry labels.plot_admix for making admixture/structure plots with most of the same options as plot_popkin!
print_labels_multi, print_labels.MemFree (from /proc/meminfo). This could underestimate available memory when Buffers and Cached memory are large (these count as available memory!), and in some cases cause this error:
Error in solve_m_mem_lim :
The resulting `m_chunk` was negative! This is because either `mat_n_n` or `vec_n` are non-zero and `n` alone is too large for the available memory (even for `m_chunk == 0`). The solution is to free more memory (ideal) or to reduce `n` if possible.
MemAvailable (still from /proc/meminfo), which is ideal but is absent in older linux kernels (<3.14), otherwise fallback into retrieving and returning the sum of MemFree, Buffers, and Cached. Either way available memory is greater than MemFree alone and is also more accurate.plot_popkin fixed a bug when null_panel_data = TRUE in which titles that went over panels with NULL kinship were incorrectly omitted.avg_kinship_subpops.popkin_A_min_subpops:
avg_kinship_subpops internally to perform the bulk of the calculationssubpops = NULL, calculation now returns minimum A among off-diagonal elements only (excluding diagonal) rather than the overall minimum of A. There's no difference when A is calculated from genotypes (diagonal values are much greater than off-diagonal values), but made the change for consistency when it might differ for arbitrary inputs.README updated GitHub install instructions for building vignettes.LazyData: true from DESCRIPTION (to avoid a new "NOTE" on CRAN).NEWS.md slightly to improve its automatic parsing.weights_subpops updates:
subsubpops for calculating weights on two levels.table).inst/CITATION (missed last time I updated them in other locations).Overall added tree plotting capabilities and more plotting fine control.
plot_phylo for plotting phylo trees.
This is a wrapper around ape::plot.phylo that makes several adjustments so plots agree more with accompanying kinship matrices (package ape is now a dependency for this feature).plot_popkin had the following updates:
phylo and function are now accepted elements in input list kinship (first argument).
If phylo, these trees are plotted via plot_phylo.
If function, its code is executed without arguments, which is expected to plot a single panel.ylab_side to allow placing labels on x-axis (bottom, but also top, and right side) instead of the default y-axis (left side).leg_column for placing legend/color key in any column (default last column, which was the only choice before).panel_letters_adj for positioning panel letters more finely, farther into the margin.
Also, previous hardcoded default of 0 (inside x-axis range) was changed now to -0.1 (just outside the x-axis range in most cases).names = TRUE) are now always plotted entirely, even if overlapping.
The old behavior (R's default) plotted names in order and skipped overlapping labels (see ?axis), which looks prettier but was confusing for this plot as it suggested incorrectly that some individuals or subpopulations were not present.
The solution is unfortunately a hack, to pass gap.axis = -1 to axis (suggested in ?axis), which hopefully does not break in the future.validate_kinship now has option logical = TRUE to return a logical value instead of throwing errors.popkin_af, which is the analog of popkin but for allele frequency matrices instead of genotypes, and as a consequence it estimates coancestry instead of kinship.popkin function is run.
Free memory is not calculated in these systems and defaults to 1GB, which threw a warning since could cause problems if the actual memory available is less.
However, since free memory is rarely below 1GB on reasonable systems, throwing this warning had become more problematic than it was useful (it interfered with internal unit testing), so I decided to remove the warning.DESCRIPTION, README.md and the vignette, to point to the published method in PLoS Genetics, and also a related preprint of human analysis on bioRxiv.popkin_A (used to be unexported get_A) and popkin_A_min_subpops (used to be unexported min_mean_subpops)
popkin function.popkin methodvalidate_kinship added option name (default "kinship") for clear error reports when the matrix being tested is not actually a kinship matrix
name = "A" to validate A in popkin_A_min_subpops.popkin
want_M option, which if TRUE returns a list containing the kinship matrix as well as the pairwise complete count matrix M.m_chunk_max option (default 1000), which sets the maximum number of loci to process at the time.
The new default behavior reduces memory usage a lot, especially on machines with large memory, without sacrificing speed.
Original version would use a lot of memory just because it was available, which could be inconvenient when trying to run other processes, and did not result in increased speed, so it was unnecessary at best.validate_kinship now has sym option that, if FALSE, skips symmetry test (defaults to TRUE).plot_popkin has the same sym option passed to validate_kinship, but here it defaults to FALSE (there is no inherent error caused by plotting non-symmetric matrices).solve_m_mem_lim now avoids a rare integer overflow caused when input number of individuals n was encoded as an integer and was greater than sqrt(.Machine$integer.max), or 46340.95.validate_kinship now tests for symmetry in input kinship matrices too.x_local parameter to function fst, which permits estimation of FST when there is known local inbreeding (estimated from a pedigree or IBD blocks).inbrDiag, neff, plotPopkin, rescalePopkin, weightsSubpops.
popkin function: lociOnCols, memLim.class usage now that matrices return a two-element array in R-devel (required by CRAN).calc_leg_width_min internal function, though it is unfinished and unused.man/figures/More improvements to function plot_popkin:
oma, which sets outer margins via par(oma) but provides additional useful shortcuts and defaults.
This changes the default behavior of plot_popkin by setting the left outer margin to 1.5 (all other values are zero), whereas before plot_popkin did not set any outer margins.
This new default behavior makes the "Individuals" outer label appear automatically in plots (whereas before, simply calling plot_popkin without setting outer margins resulted in this outer-margin y-axis label being hidden from view).mar to accept various shortcuts (scalar values set only bottom and left margin, whereas the second value of a vector of length 2 sets the top margin, which is otherwise zero; in these two cases the right margin is zero).
Default behavior remains to not change existing margins.Improvements to function plot_popkin:
leg_per_panel, which if true allows each kinship panel to have a different scale (each gets its own legend/color key).leg_* options to be able to take on different values per panel.leg_width to control the width of the legend panels.
Increased the default width of this legend/color key (from 0.1 to 0.3, as a fraction of the width of the kinship panels), which changes the behavior in the original case when this legend is shared across kinship panels.
Now the full legend fits in the panel, without needing an outer margin to the right.leg_mar behavior changed.
Now leg_mar can be a scalar, which sets the right margin of the legend panel.
New default is leg_mar = 3, again necessary so the label of the legend fits in the panel.
Previous behaviors of leg_mar = NULL and a full margin specification are retained.plot_popkin, added option names_lasplot_popkin_single:
kinship_range to agree with the default of plot_popkin when a single kinship matrix is plotted (as a result, default colors now agree in that case too).breaks is now invisible.plot_popkin are visible, differences are only noticeable calling this internal function plot_popkin_single directly.Memory control bugfixes
BEDMatrix object is analyzed
solve_m_mem_lim now returns memory limit from get_mem_lim or user, in addition to the chunk size in both number of loci and in expected memory usage.Other enhancements
n_eff function now ensures output n_eff estimates are in the theoretically valid range of [ 1, 2*n ].
Numerical issues in small and noisy kinship matrix estimates could lead to out-of-bounds estimates, which are now replaced with their closest boundary values.plot_popkin updates:
labs_even = TRUE were not placed correctly.
The error was most evident for very small samples (i.e. n = 3 individuals), and was imperceptible otherwise (i.e. n = 100 or more).diag_line = TRUE did not extend fully to extremes.
This error was again most evident for very small samples, and was imperceptible otherwise.weights option, to change width of every individual to highlight individuals with more weight.raster option, equivalent to useRaster option in the image function used internally.
If weights are not NULL, raster is forced to FALSE (required for image to work in this setting).
So its only use is to set it when weights are null, as needed.solve_m_mem_lim in other dependent packages.
In particular, the internal function get_mem_lim_m was removed.popkin function accepts the new parameter mem_factor.solve_m_mem_lim always returns integer chunk sizes (number of loci).
Previously the function returned non-integers only if the total matrix size m was not provided.solve_m_mem_lim, which generalizes previous behavior to estimate chunk sizes (in number of loci) given a limited memory and number of individuals for various numbers of matrices (of dimensions (m,n) or (n,n)) and vectors (lengths m or n).
This function is shared with related projects (such as popkinsuppl on GitHub)..Rbuildignore to stop ignoring README; also removed non-existent files from listpopkin.Rproj fileinbr_diag now handles NULL inputs correctly (preserves them as NULL without throwing errors).plot_popkin has a new logical option null_panel_data, to change behavior in the presence of NULL kinship matrices (whether they must or must not have titles and other parameters).
NULL panelspopkin function the deprecated parameter names lociOnCols and memLim alongside the new names, to prevent breaking existing code (generate warnings).plot_popkin bug fixes and enhancements!
plot_popkin now resets graphical parameters when done and after every panel as needed.
NULL (default) for subsequent panels, the original margins were not reset (instead, the last values were incorrectly propagated).par values) is now reset after plotting is complete.plot_popkin option panel_letters (default is A-Z, so the default remains to not show letters for a single panel).leg_cex option to plot_popkin.inbrDiag -> inbr_diagneff -> n_effplotPopkin -> plot_popkinrescalePopkin -> rescale_popkinweightsSubpops -> weights_subpopsplot_popkin).plotPopkin retains the older argument names.inbr_diag now accepts lists of kinship matrices to transform (for easier plotting of multiple matrices).plot_popkin now requires its non-NULL inputs to be proper kinship matrices.
Previously, the code used to somewhat allow for non-square matrices to be visualized, but this case had no guarantees to work.
The code is cleaner under the assumption of symmetric square matrices.validate_kinship, mean_kinshipNow the popkin function preserves the individual names if they are present in the input genotype matrix.
These names get copied to the rows and columns of the output kinship matrix.
Converted the vignette from PDF to HTML
neff function (estimates effective sample size given a kinship matrix and weights; can find optimal weights that are non-negative or sign-unconstrained, yielding maximum neff values)fst, inbr, plotPopkin).RColorBrewer.printLabs (used by plotPopkin) is now more flexible in where it places its labels (new args side1 and side2)plotPopkin now allows NULL elements in input list x, makes empty plots with titles (good for placeholders or other non-existent data)
Clarified plotPopkin documentation (that marPad is added to xMar values if set)
README.md now contains instructions for installing from CRAN as well as from GitHub.
lfa is not available (needed for CRAN tests). This change is not visible in rendered vignette included in package.All doc examples are now run (all used to be dontrun).
Other minor non-code changes for first CRAN submission.
BEDMatrix object) caused popkin to die. Now popkin behaves as expected. New test unit cases were added to test function inputs (previously this case was untested).