|
| 1 | +# fastBE: A regression based approach to phylogenetic reconstruction from bulk DNA sequencing of tumors |
| 2 | + |
| 3 | +*fastBE* is a method for inferring the evolutionary history |
| 4 | +of tumors from multi-sample bulk DNA sequencing data. |
| 5 | +Our method uses ideas from distance based phylogenetics and |
| 6 | +a handcrafted solver of the variant allele frequency |
| 7 | +$\ell_1$-regression problem. |
| 8 | + |
| 9 | + |
| 10 | + |
| 11 | +If you find this tool useful in your research, please cite us at: |
| 12 | +``` |
| 13 | +``` |
| 14 | + |
| 15 | +## Installation |
| 16 | + |
| 17 | +`fastBE` is implemented in C++ and is packaged with the dependencies |
| 18 | +needed to execute the program. In particular, the only dependencies are |
| 19 | +a recent version of CMAKE and a modern C++17 compliant compiler. |
| 20 | + |
| 21 | +To build `fastBE` from source, first clone the repository and its submodules: |
| 22 | +``` |
| 23 | +$ git clone --recurse-submodules https://github.com/schmidt73/fastbe.git |
| 24 | +``` |
| 25 | + |
| 26 | +Then from the root of the project directory, execute the following sequence o |
| 27 | +commands: |
| 28 | +``` |
| 29 | +$ mkdir build; cd build |
| 30 | +$ cmake .. |
| 31 | +$ make |
| 32 | +``` |
| 33 | +The output binary will be located at `build/src/fastbe`. |
| 34 | + |
| 35 | +## Usage |
| 36 | + |
| 37 | +To run *fastbe*, simply execute the binary. |
| 38 | +``` |
| 39 | +Usage: fastbe [--help] [--version] {regress,search} |
| 40 | +
|
| 41 | +Optional arguments: |
| 42 | + -h, --help shows help message and exits |
| 43 | + -v, --version prints version information and exits |
| 44 | +
|
| 45 | +Subcommands: |
| 46 | + regress Regresses a clone tree onto a frequency matrix. |
| 47 | + search Searches for a clone tree that best fits a frequency matrix. |
| 48 | +``` |
| 49 | + |
| 50 | +### Input format |
| 51 | + |
| 52 | +The input format for *fastbe* consists of a frequency matrix $F$ |
| 53 | +in TXT format. Rows are separated by newlines |
| 54 | +and columns are separated by spaces. Rows correspond |
| 55 | +to distinct samples and columns correspond to distinct mutation clusters. |
| 56 | +More formally, $F_{ij}$ is the frequency of the $j^{\text{th}}$ mutation |
| 57 | +cluster in the $i^{\text{th}}$ sample. As an example, a frequency matrix $F$ |
| 58 | +describing 20 samples and 10 clones is: |
| 59 | +``` |
| 60 | +1.0000 0.9801 0.0000 0.8265 0.0156 0.3683 0.2450 0.1218 0.1260 0.0000 |
| 61 | +1.0000 1.0000 0.0000 0.1257 0.0000 0.0000 0.0000 0.0000 0.0000 0.1436 |
| 62 | +1.0000 0.5202 0.0000 0.4053 0.5045 0.0000 0.0000 0.1945 0.0000 0.0000 |
| 63 | +1.0000 0.3497 0.6616 0.1302 0.0000 0.0000 0.0000 0.1558 0.0000 0.0000 |
| 64 | +1.0000 0.7233 0.1356 0.5780 0.1640 0.0574 0.0785 0.2873 0.0728 0.1083 |
| 65 | +1.0000 0.8394 0.0646 0.8530 0.0000 0.0000 0.0000 0.0000 0.0000 0.8353 |
| 66 | +1.0000 0.1309 0.6547 0.0000 0.0174 0.0000 0.0000 0.0000 0.0000 0.0000 |
| 67 | +1.0000 0.4203 0.1889 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 |
| 68 | +1.0000 0.2731 0.1406 0.2768 0.4452 0.0000 0.0000 0.0000 0.0000 0.0000 |
| 69 | +1.0000 0.5346 0.4651 0.5437 0.0000 0.0000 0.0000 0.1311 0.1069 0.0000 |
| 70 | +1.0000 0.1043 0.0000 0.1258 0.7614 0.0000 0.0000 0.0566 0.0685 0.0562 |
| 71 | +1.0000 0.8784 0.0000 0.3935 0.0122 0.0000 0.0124 0.2469 0.2382 0.1668 |
| 72 | +1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 |
| 73 | +1.0000 0.9163 0.0412 0.8878 0.0348 0.1197 0.1299 0.6229 0.4443 0.0010 |
| 74 | +1.0000 0.9130 0.0418 0.6500 0.0458 0.0000 0.0000 0.0000 0.0000 0.0000 |
| 75 | +1.0000 0.8923 0.0000 0.9118 0.0000 0.0000 0.0000 0.9092 0.0000 0.0000 |
| 76 | +1.0000 0.9318 0.0000 0.9369 0.0704 0.0000 0.0000 0.8478 0.8136 0.0000 |
| 77 | +1.0000 0.4489 0.3422 0.4024 0.1828 0.0270 0.0000 0.3257 0.3120 0.0350 |
| 78 | +1.0000 0.9753 0.0000 0.8931 0.0000 0.5100 0.0506 0.0682 0.0000 0.1435 |
| 79 | +1.0000 1.0000 0.0000 1.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 |
| 80 | +``` |
| 81 | + |
| 82 | +The above frequency matrix is provided as an input file at `examples/sim_obs_frequency_matrix.txt`. |
| 83 | +The above frequency matrix was generated using the command, |
| 84 | +`python scripts/simulation.py --clones 20 --samples 10 --coverage 100 --seed 0 --mutations 100 --output examples/sim` |
| 85 | +which simulates the evolution of a tumor with 7 mutation clusters (equivalently, clones) and 20 samples at a |
| 86 | +read depth of 100$\times$ and 100 mutations distribution across the 10 mutation clusters. |
| 87 | +Several other files such as the mutation to clone mapping, the ground truth usage matrix $U$, clonal |
| 88 | +matrix $B$, and read count matrices are also provided in the `examples/` directory. |
| 89 | + |
| 90 | +## Usage Example |
| 91 | + |
0 commit comments