You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+8-10Lines changed: 8 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,10 +10,11 @@
10
10
11
11
</p>
12
12
13
-
<h3align="center"> ICOR: Improving Codon Optimization with Recurrent neural networks <h4>
13
+
<h3align="center"> ICOR: Improving Codon Optimization with Recurrent neural networks <h3>
14
14
15
15
---
16
16
-[About](#About)
17
+
-[Quickstart](#Quickstart)
17
18
-[Assets](#Assets)
18
19
-[Benchmark Results](#Benchmark-Results)
19
20
-[Benchmark Sequences](#Benchmark-Sequences)
@@ -25,23 +26,20 @@
25
26
-[Dependencies](#Dependencies)
26
27
---
27
28
29
+
### About
30
+
In protein sequences—as there are 61 sense codons but only 20 standard amino acids—most amino acids are encoded by more than one codon. Although such synonymous codons do not alter the encoded amino acid sequence, their selection can dramatically affect the production of the resulting protein. Codon optimization of synthetic DNA sequences for maximum expression is an important segment of heterologous expression. However, existing solutions are primarily based on choosing high-frequency codons only, neglecting the important effects of rare codons. In this paper, we propose a novel recurrent-neural-network (RNN) based codon optimization tool, ICOR, that aims to learn codon usage bias on a genomic dataset of Escherichia coli. We compile a dataset of over 42,000 non-redundant, robust genes that are used for deep learning. The model uses a bidirectional long short-term memory-based architecture, allowing for the sequential information of genes to be learnt. Our tool can predict synonymous codons for synthetic genes towards optimal expression in E. coli. We demonstrate that sequential context achieved via RNN may yield codon selection that is more similar to the host genome, therefore improving protein expression more than frequency-based approaches. On a benchmark set of over 40 select DNA sequences, ICOR tool improved the codon adaptation index by 41.69% compared to the original sequence. Our resulting algorithm is provided as an open-source software package along with the benchmark set of sequences.
31
+
28
32
### Quickstart
29
33
I really like having a quickstart section that gives me a single command to install prereqs, a single command to run all tests (if any), and a single command to run the application. Something like:
30
34
31
35
```bash
32
36
# Install prereqs
33
-
pip install -r requriements.txt # or an install_prereqs.sh script if you have more diverse dependencies
34
-
35
-
# run tests (if you decided to add tests in the future)
36
-
pytest
37
+
pip install -r requirements.txt
37
38
38
-
#run models
39
-
python ./tool/optimizers/brute_force_optimizer.py
39
+
#Run ICOR optimizer
40
+
python ./tool/optimizers/icor_optimizer.py
40
41
```
41
42
42
-
### About
43
-
In protein sequences—as there are 61 sense codons but only 20 standard amino acids—most amino acids are encoded by more than one codon. Although such synonymous codons do not alter the encoded amino acid sequence, their selection can dramatically affect the production of the resulting protein. Codon optimization of synthetic DNA sequences for maximum expression is an important segment of heterologous expression. However, existing solutions are primarily based on choosing high-frequency codons only, neglecting the important effects of rare codons. In this paper, we propose a novel recurrent-neural-network (RNN) based codon optimization tool, ICOR, that aims to learn codon usage bias on a genomic dataset of Escherichia coli. We compile a dataset of over 42,000 non-redundant, robust genes that are used for deep learning. The model uses a bidirectional long short-term memory-based architecture, allowing for the sequential information of genes to be learnt. Our tool can predict synonymous codons for synthetic genes towards optimal expression in E. coli. We demonstrate that sequential context achieved via RNN may yield codon selection that is more similar to the host genome, therefore improving protein expression more than frequency-based approaches. On a benchmark set of over 40 select DNA sequences, ICOR tool improved the codon adaptation index by 41.69% compared to the original sequence. Our resulting algorithm is provided as an open-source software package along with the benchmark set of sequences.
44
-
45
43
### Assets
46
44
Assets including images and branding for the ICOR tool, hosted on the [biotools by Lattice Automation](https://tools.latticeautomation.com/) website.
type=input("Welcome to ICOR! Are you optimizing an amino acid sequence (enter in 'aa' below) or a dna/codon sequence (enter in 'dna' below)?\n\n").strip().upper()
11
+
sequence_type=input("Welcome to ICOR! Are you optimizing an amino acid sequence (enter in 'aa' below) or a dna/codon sequence (enter in 'dna' below)?\n\n").strip().upper()
14
12
input_seq=input(
15
13
"Enter the coding sequence only.\nEnter in 'demo' to use demo sequence.\n\n").strip().upper()
16
-
# 'type' is a builtin function in python - I'd recommend renaming the var to sequence_type to avoid reassigning it
sys.exit('Invalid amino acid sequence detected.\nThe sequence must start with M and end with * because ICOR only optimizes the codon-sequence region!\nPlease try again.\nRead more: http://www.hgvs.org/mutnomen/references.html#aalist')
0 commit comments