Skip to content

Commit e3e66cb

Browse files
committed
Update README
1 parent 9d37be3 commit e3e66cb

1 file changed

Lines changed: 57 additions & 10 deletions

File tree

README.md

Lines changed: 57 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# fastBE: A regression based approach to phylogenetic reconstruction from bulk DNA sequencing of tumors
1+
## fastBE: A regression based approach to phylogenetic reconstruction from bulk DNA sequencing of tumors
22

33
*fastBE* is a method for inferring the evolutionary history
44
of tumors from multi-sample bulk DNA sequencing data.
@@ -12,7 +12,7 @@ If you find this tool useful in your research, please cite us at:
1212
```
1313
```
1414

15-
## Installation
15+
### Installation
1616

1717
`fastBE` is implemented in C++ and is packaged with the dependencies
1818
needed to execute the program. In particular, the only dependencies are
@@ -32,7 +32,7 @@ $ make
3232
```
3333
The output binary will be located at `build/src/fastbe`.
3434

35-
## Usage
35+
### Usage
3636

3737
To run *fastbe*, simply execute the binary.
3838
```
@@ -47,15 +47,33 @@ Subcommands:
4747
search Searches for a clone tree that best fits a frequency matrix.
4848
```
4949

50+
The two modes of fastbe are `search` and `regress`. The `search` mode
51+
solves the variant allele frequency $\ell_1$-deconvolution problem,
52+
while the `regress` mode solves the variant allele frequency
53+
$\ell_1$-regression problem, both of which are defined in
54+
our manuscript.
55+
56+
The `search` mode takes as input an $m \times n$ frequency matrix $F$ and outputs
57+
an $n$-clonal tree that best fits the frequency matrix. *Important note: the search
58+
command requires a root vertex specified with the `-f/--assigned-root` flag.*
59+
By default this root vertex is set to be $0$. When the root vertex is unknown,
60+
it suffices to append an extra column to the beginning of the frequency matrix
61+
and specify the root as $0$.
62+
63+
The `regress` mode
64+
takes as input an $m \times n$ frequency matrix $F$ and an $n$-clonal
65+
tree $\mathcal{T}$ and outputs the minimum value of
66+
`\lVert F - UB_{\mathcal{T}} \rVert_1` over all usage matrices $U$.
67+
5068
### Input format
5169

52-
The input format for *fastbe* consists of a frequency matrix $F$
53-
in TXT format. Rows are separated by newlines
70+
The input format for the `search` mode of *fastbe* consists of a frequency
71+
matrix $F$ in `.txt` format. Rows are separated by newlines
5472
and columns are separated by spaces. Rows correspond
5573
to distinct samples and columns correspond to distinct mutation clusters.
5674
More formally, $F_{ij}$ is the frequency of the $j^{\text{th}}$ mutation
5775
cluster in the $i^{\text{th}}$ sample. As an example, a frequency matrix $F$
58-
describing 20 samples and 10 clones is:
76+
describing $20$ samples and $10$ clones is:
5977
```
6078
1.0000 0.9801 0.0000 0.8265 0.0156 0.3683 0.2450 0.1218 0.1260 0.0000
6179
1.0000 1.0000 0.0000 0.1257 0.0000 0.0000 0.0000 0.0000 0.0000 0.1436
@@ -79,13 +97,42 @@ describing 20 samples and 10 clones is:
7997
1.0000 1.0000 0.0000 1.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000
8098
```
8199

82-
The above frequency matrix is provided as an input file at `examples/sim_obs_frequency_matrix.txt`.
83-
The above frequency matrix was generated using the command,
100+
The above frequency matrix $F$ is provided as an input file at `examples/sim_obs_frequency_matrix.txt`.
101+
102+
The input format for the `regress` mode of *fastbe* is the aforementioned
103+
frequency matrix $F$ and an $n$-clonal tree $\mathcal{T}$. The tree is specified
104+
as an adjacency list in `.txt` format. An example of a clonal tree
105+
which consists of $10$ clones rooted at the $0$ vertex is:
106+
```
107+
0 1 2 4
108+
1 3
109+
2
110+
3 5 6 7 9
111+
4
112+
5
113+
6
114+
7 8
115+
8
116+
9
117+
```
118+
119+
The above clonal tree $\mathcal{T}$ is provided as an input file at `examples/sim_tree.txt`.
120+
121+
The above frequency matrix and clonal tree were generated using the command,
84122
`python scripts/simulation.py --clones 20 --samples 10 --coverage 100 --seed 0 --mutations 100 --output examples/sim`
85-
which simulates the evolution of a tumor with 7 mutation clusters (equivalently, clones) and 20 samples at a
86-
read depth of 100$\times$ and 100 mutations distribution across the 10 mutation clusters.
123+
which simulates the evolution of a tumor with $10$ mutation clusters (equivalently, clones) and $20$ samples at a
124+
read depth of $100\times$ and $100$ mutations distribution across the $10$ mutation clusters.
87125
Several other files such as the mutation to clone mapping, the ground truth usage matrix $U$, clonal
88126
matrix $B$, and read count matrices are also provided in the `examples/` directory.
89127

90128
## Usage Example
91129

130+
As an example, we will infer a phylogenetic tree from the simulated
131+
data with $20$ samples and $10$ clones. To run `fastbe` on this data,
132+
execute:
133+
```
134+
fastbe search examples/sim_obs_frequency_matrix.txt -o examples/fastbe
135+
```
136+
This command will output an adjacency list describing the clonal tree
137+
at `examples/fastbe_tree.txt` and a `.json` file containing metadata
138+
at `examples/fastbe_results.json`.

0 commit comments

Comments
 (0)