1- # fastBE: A regression based approach to phylogenetic reconstruction from bulk DNA sequencing of tumors
1+ ## fastBE: A regression based approach to phylogenetic reconstruction from bulk DNA sequencing of tumors
22
33* fastBE* is a method for inferring the evolutionary history
44of tumors from multi-sample bulk DNA sequencing data.
@@ -12,7 +12,7 @@ If you find this tool useful in your research, please cite us at:
1212```
1313```
1414
15- ## Installation
15+ ### Installation
1616
1717` fastBE ` is implemented in C++ and is packaged with the dependencies
1818needed to execute the program. In particular, the only dependencies are
3232```
3333The output binary will be located at ` build/src/fastbe ` .
3434
35- ## Usage
35+ ### Usage
3636
3737To run * fastbe* , simply execute the binary.
3838```
@@ -47,15 +47,33 @@ Subcommands:
4747 search Searches for a clone tree that best fits a frequency matrix.
4848```
4949
50+ The two modes of fastbe are ` search ` and ` regress ` . The ` search ` mode
51+ solves the variant allele frequency $\ell_1$-deconvolution problem,
52+ while the ` regress ` mode solves the variant allele frequency
53+ $\ell_1$-regression problem, both of which are defined in
54+ our manuscript.
55+
56+ The ` search ` mode takes as input an $m \times n$ frequency matrix $F$ and outputs
57+ an $n$-clonal tree that best fits the frequency matrix. * Important note: the search
58+ command requires a root vertex specified with the ` -f/--assigned-root ` flag.*
59+ By default this root vertex is set to be $0$. When the root vertex is unknown,
60+ it suffices to append an extra column to the beginning of the frequency matrix
61+ and specify the root as $0$.
62+
63+ The ` regress ` mode
64+ takes as input an $m \times n$ frequency matrix $F$ and an $n$-clonal
65+ tree $\mathcal{T}$ and outputs the minimum value of
66+ ` \lVert F - UB_{\mathcal{T}} \rVert_1 ` over all usage matrices $U$.
67+
5068### Input format
5169
52- The input format for * fastbe* consists of a frequency matrix $F$
53- in TXT format. Rows are separated by newlines
70+ The input format for the ` search ` mode of * fastbe* consists of a frequency
71+ matrix $F$ in ` .txt ` format. Rows are separated by newlines
5472and columns are separated by spaces. Rows correspond
5573to distinct samples and columns correspond to distinct mutation clusters.
5674More formally, $F_ {ij}$ is the frequency of the $j^{\text{th}}$ mutation
5775cluster in the $i^{\text{th}}$ sample. As an example, a frequency matrix $F$
58- describing 20 samples and 10 clones is:
76+ describing $20$ samples and $10$ clones is:
5977```
60781.0000 0.9801 0.0000 0.8265 0.0156 0.3683 0.2450 0.1218 0.1260 0.0000
61791.0000 1.0000 0.0000 0.1257 0.0000 0.0000 0.0000 0.0000 0.0000 0.1436
@@ -79,13 +97,42 @@ describing 20 samples and 10 clones is:
79971.0000 1.0000 0.0000 1.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000
8098```
8199
82- The above frequency matrix is provided as an input file at ` examples/sim_obs_frequency_matrix.txt ` .
83- The above frequency matrix was generated using the command,
100+ The above frequency matrix $F$ is provided as an input file at ` examples/sim_obs_frequency_matrix.txt ` .
101+
102+ The input format for the ` regress ` mode of * fastbe* is the aforementioned
103+ frequency matrix $F$ and an $n$-clonal tree $\mathcal{T}$. The tree is specified
104+ as an adjacency list in ` .txt ` format. An example of a clonal tree
105+ which consists of $10$ clones rooted at the $0$ vertex is:
106+ ```
107+ 0 1 2 4
108+ 1 3
109+ 2
110+ 3 5 6 7 9
111+ 4
112+ 5
113+ 6
114+ 7 8
115+ 8
116+ 9
117+ ```
118+
119+ The above clonal tree $\mathcal{T}$ is provided as an input file at ` examples/sim_tree.txt ` .
120+
121+ The above frequency matrix and clonal tree were generated using the command,
84122` python scripts/simulation.py --clones 20 --samples 10 --coverage 100 --seed 0 --mutations 100 --output examples/sim `
85- which simulates the evolution of a tumor with 7 mutation clusters (equivalently, clones) and 20 samples at a
86- read depth of 100$ \times$ and 100 mutations distribution across the 10 mutation clusters.
123+ which simulates the evolution of a tumor with $10$ mutation clusters (equivalently, clones) and $20$ samples at a
124+ read depth of $100 \times$ and $ 100$ mutations distribution across the $10$ mutation clusters.
87125Several other files such as the mutation to clone mapping, the ground truth usage matrix $U$, clonal
88126matrix $B$, and read count matrices are also provided in the ` examples/ ` directory.
89127
90128## Usage Example
91129
130+ As an example, we will infer a phylogenetic tree from the simulated
131+ data with $20$ samples and $10$ clones. To run ` fastbe ` on this data,
132+ execute:
133+ ```
134+ fastbe search examples/sim_obs_frequency_matrix.txt -o examples/fastbe
135+ ```
136+ This command will output an adjacency list describing the clonal tree
137+ at ` examples/fastbe_tree.txt ` and a ` .json ` file containing metadata
138+ at ` examples/fastbe_results.json ` .
0 commit comments