@@ -5,10 +5,10 @@ dataset. This will cover the steps of running from a subreads BAM file and
55generate a FASTQ of consensus reads.
66
77This covers the following stages:
8- 1 . Running [ pbccs ] with the ` --all ` option to output all reads (it is possible
9- to use DeepConsensus from existing pbccs reads, but yield will be higher when
8+ 1 . Running * [ ccs ] * with the ` --all ` option to output all reads (it is possible
9+ to use DeepConsensus from existing * ccs * reads, but yield will be higher when
1010 including all reads)
11- 2 . Aligning subreads to the pbccs consensus with [ actc]
11+ 2 . Aligning subreads to the * ccs * consensus with * [ actc] *
12123 . Running DeepConsensus using one of two options (with pip or using Docker)
1313
1414## System configuration
@@ -57,17 +57,17 @@ bash install_nvidia_docker.sh
5757
5858to make sure our GPU is set up correctly.
5959
60- ## Process the data with [ pbccs ] and [ actc]
60+ ## Process the data with * ccs * and * actc*
6161
62- You can install ` ccs ` and ` actc ` on your own. For convenience, we put them in
62+ You can install * [ ccs] * and * [ actc] * on your own. For convenience, we put them in
6363a Docker image:
6464
6565```
66- DOCKER_IMAGE=google/deepconsensus:0.2.0rc -gpu
66+ DOCKER_IMAGE=google/deepconsensus:0.2.0rc1 -gpu
6767sudo docker pull ${DOCKER_IMAGE}
6868```
6969
70- DeepConsensus operates on subreads aligned to a draft consensus. We use [ pbccs ]
70+ DeepConsensus operates on subreads aligned to a draft consensus. We use * ccs *
7171to generate this.
7272
7373``` bash
@@ -82,9 +82,9 @@ Note that the `--all` flag is a required setting for DeepConsensus to work
8282optimally. This allows DeepConsensus to rescue reads previously below the
8383quality threshold.
8484If you want to split up the task for parallelization, we recommend using the
85- ` --chunk ` option in ` ccs ` .
85+ ` --chunk ` option in * ccs* .
8686
87- Then, we create ` subreads_to_ccs.bam ` was created by running [ actc] :
87+ Then, we create ` subreads_to_ccs.bam ` was created by running * actc* :
8888
8989``` bash
9090sudo docker run -v " ${DATA} " :" /data" ${DOCKER_IMAGE} \
@@ -94,11 +94,13 @@ sudo docker run -v "${DATA}":"/data" ${DOCKER_IMAGE} \
9494 /data/subreads_to_ccs.bam
9595```
9696
97- DeepConsensus will take FASTA format of ccs, so we use samtools to generate.
97+ DeepConsensus will take FASTA format of * ccs* .
98+
99+ * actc* already converted the BAM into FASTA. Rename and index it.
98100
99101``` bash
100102sudo docker run -v " ${DATA} " :" /data" ${DOCKER_IMAGE} \
101- samtools fasta --threads " $( nproc ) " /data/ccs.bam > ${DATA} /ccs.fasta
103+ mv /data/subreads_to_ccs.fasta /data /ccs.fasta
102104
103105sudo docker run -v " ${DATA} " :" /data" ${DOCKER_IMAGE} \
104106 samtools faidx /data/ccs.fasta
@@ -111,7 +113,7 @@ sudo docker run -v "${DATA}":"/data" ${DOCKER_IMAGE} \
111113You can install DeepConsensus using ` pip ` :
112114
113115``` bash
114- pip install deepconsensus[gpu]==0.2.0rc0
116+ pip install deepconsensus[gpu]==0.2.0rc1
115117```
116118
117119NOTE: If you're using a CPU machine, install with ` deepconsensus[cpu] ` instead.
@@ -138,7 +140,7 @@ time deepconsensus run \
138140
139141At the end of your run, you should see:
140142```
141- Processed 1000 ZMWs in 346.73112511634827 seconds
143+ Processed 1000 ZMWs in 341.3297851085663 seconds
142144Outcome counts: OutcomeCounter(empty_sequence=0, only_gaps_and_padding=50, failed_quality_filter=424, failed_length_filter=0, success=526)
143145```
144146the outputs can be found at the following paths:
@@ -169,7 +171,7 @@ time sudo docker run --gpus all \
169171At the end of your run, you should see:
170172
171173```
172- Processed 1000 ZMWs in 433.63712906837463 seconds
174+ Processed 1000 ZMWs in 428.84565114974976 seconds
173175Outcome counts: OutcomeCounter(empty_sequence=0, only_gaps_and_padding=50, failed_quality_filter=424, failed_length_filter=0, success=526)
174176```
175177
@@ -184,6 +186,6 @@ You might be able to tweak parameters like `--batch_zmws` depending on your
184186hardware limit. You can also see [ runtime_metrics.md] ( runtime_metrics.md ) for
185187runtime on different CPU or GPU machines.
186188
187- [ pbccs ] : https://github.com/PacificBiosciences/ ccs
189+ [ ccs ] : https://ccs.how
188190[ actc ] : https://github.com/PacificBiosciences/align-clr-to-ccs
189191[ a GitHub issue ] : https://github.com/google/deepconsensus/issues
0 commit comments