Skip to content

Releases: ms609/TreeDist

v2.14.0

11 May 06:44

Choose a tag to compare

New features

  • TransferConsensus() constructs a consensus tree that minimizes the sum of transfer distances to a set of input trees, using a greedy add-and-prune heuristic. Unlike majority-rule consensus, which can be highly unresolved when phylogenetic signal is diffuse, the transfer consensus uses the finer-grained transfer distance to produce more resolved trees.

  • TransferDist() computes the transfer dissimilarity between phylogenetic trees, with scaled and unscaled variants. Supports all-pairs, cross-pairs, and single-pair computations.

  • LAP (Jonker–Volgenant linear assignment) and MCI (Mutual Clustering Information) C++ implementations are now exposed via inst/include/TreeDist/ headers, allowing downstream packages to use LinkingTo: TreeDist.

Internals

  • Stack-allocated split buffers replaced with dynamically-sized vectors, removing a hard dependency on the compile-time SL_MAX_SPLITS constant. TreeDist now supports trees of any size permitted by TreeTools.

  • Large-tree support (requires TreeTools ≥ 2.3.0): all distance functions now accept trees with up to 32 767 tips (previously limited to SL_MAX_TIPS, 2048 with TreeTools ≤ 2.2.0).

Performance

  • RobinsonFoulds() now uses a fast C++ batch path for cross-distance computations (tree list vs tree list), matching the existing all-pairs batch performance. Previously, cross-distance calls fell through to per-pair R dispatch (~27× slower per pair); the new path achieves ~21× speedup on typical inputs (e.g. 50 × 250 trees, 50 tips).

v2.13.0

11 May 06:43

Choose a tag to compare

New features

  • MCITree() selects the tree from a posterior sample with the highest total split information content — a Maximum Clade Information analogue of the Maximum Clade Credibility tree.

Performance

Pairwise distance computation has been substantially optimized. Typical speedups over v2.12.0 for tree sets where most splits are shared (MCMC posteriors, bootstrap replicates):

Metric 100 × 50 tips 40 × 200 tips
ClusteringInfoDistance ~5× ~12×
MatchingSplitDistance ~7× ~11×
InfoRobinsonFoulds ~4× ~5×
DifferentPhylogeneticInfo ~1.3× ~1.1×
MatchingSplitInfoDistance ~1.4× ~1×

OpenMP parallelism

  • All pairwise distance functions now use an OpenMP multi-threaded batch path when the package is compiled with OpenMP support, for both all-pairs and cross-pairs (tree1 vs tree2) computations.

  • The number of OpenMP threads is controlled by options(mc.cores = N); the default is 1 (single-threaded). Set mc.cores to parallel::detectCores() or a fixed integer to enable multi-threading.
    StartParallel() / StopParallel() are no longer needed when OpenMP is available.

Algorithmic improvements

  • Exact split matches between trees are now detected via an O(n log n) sort-and-merge pre-scan, reducing the linear assignment problem to only the unmatched splits. For tree sets with high split overlap,
    this yields the largest portion of the speedups above.

  • Internal lookup table for log₂ values shrunk from 32 MB to 16 KB, improving L1 cache locality for information-based distance metrics.

  • Information content accumulation in MutualClusteringInfo() rewritten as a branchless expression, reducing per-split-pair table lookups from 16 to 4 and eliminating 8 branches.

  • spi_overlap() (used by SharedPhylogeneticInfo(), MatchingSplitInfoDistance(), and JaccardRobinsonFoulds()) rewritten to use a single-pass hardware POPCNT approach, replacing the previous four-pass boolean scan.

  • Hardware POPCNT instruction now used on x86-64 via inline assembly (requires TreeTools ≥ 2.2).

  • Internal cost-matrix storage is now pooled across tree pairs within each thread, eliminating per-pair heap allocation overhead.

R-level fast paths

  • ClusteringInfoDistance(), DifferentPhylogeneticInfo(), MatchingSplitInfoDistance(), and InfoRobinsonFoulds() now avoid duplicate as.Splits() conversions and use C++ batch functions for per-tree entropy/information computation. This reduces R-level overhead by ~8–17% for typical analyses.

  • Cross-pairs computations (tree1 vs tree2 where both are lists) now use the same optimized batch path as all-pairs computations.

Kendall & Colijn distance

  • KCVector() reimplemented in C++, giving ~220× speedup per tree.

  • All-pairs and cross-pairs KendallColijn() Euclidean distances now computed in C++ (pair_diff_euclidean(), vec_diff_euclidean()).

v2.12.0

13 Feb 11:14

Choose a tag to compare

  • Support larger trees in some functions by updating some functions to use 32-bit integers, per TreeTools v2.1.0.

  • AHMI() now returns negative values (previously zeroed in error).

  • Experimental support for a new method of SPR distance calculation (subject to change or removal).

v2.11.1

13 Oct 14:25

Choose a tag to compare

  • Improve robustness of SpectralEigens() tests.

v2.11.0

28 Sep 13:40

Choose a tag to compare

  • HierarchicalMutualInformation() calculates the information shared between pairs of hierarchical partition structures doi:10.1103/PhysRevE.92.062825.
  • Fix bug in calculation of MutualClusteringInfo(): greedy optimization was not guaranteed to find globally optimal matching, causing distances to be overestimated in some circumstances. Thanks to @plewis for the report (#163).
  • Fix crash in robinson_foulds_all_pairs() and RobinsonFoulds(list).
  • Support larger trees in NNI distance calculations.

v2.10.1

28 Aug 05:49

Choose a tag to compare

  • Compiler-safe vector initialization, resolving M1-SAN warnings.

v2.10.0

22 Aug 15:16

Choose a tag to compare

Note - this release introduced a bug in the computation of the mutual clustering information / clustering information distance. The globally optimal matching between splits was not always found. This was fixed in v2.11.

  • Ntropy() computes entropy from integer counts.

  • C++ optimizations and reformatting:

    • Faster tree distance calculation.
    • 2x speed-up of LAPJV for large matrices.
  • Require R4.0; discontinue tests against R 3.6 and 4.0.

v2.9.2

13 Jan 06:53
d2533b6

Choose a tag to compare

  • Fix crash when calculating NNI distance for large trees.

v2.9.1

09 Sep 12:59

Choose a tag to compare

  • Avoid false positive in MKL testing environment.

v2.9.0

03 Sep 18:14

Choose a tag to compare

  • VisualizeMatching() allows more control over output format, and returns the matching (#124).

  • DistanceFromMedian(Average = median) allows calculation of MAD.

  • SpectralEigens() returns correct eigenvalues (smallest was overlooked).

  • SpectralEigens() handles values of nEig larger than the input.

  • Anticipate new behaviour of unlist(use.names = TRUE) in R 4.5.