Skip to content

Commit 2941442

Browse files
author
RJain12
committed
Merge branch 'main' into nishant-code-review
2 parents 2db1ba5 + 7f548c6 commit 2941442

3 files changed

Lines changed: 2 additions & 127 deletions

File tree

README.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -68,8 +68,6 @@ Assets including images and branding for the ICOR tool, hosted on the [biotools
6868
### Tool
6969
The ICOR tool has been divided into four directories: models, optimizers, resources, and scripts. In the `/tool/optimizers` directory sits the `icor_optimizer.py` file: an interactive script to optimize a sequence utilizing the trained ICOR model.
7070

71-
> Note as of 8/24/2021, this ICOR optimizer Python script has a bug, although it works, it does not output the correct sequence. The other script "run_icor_from_mat" does work and outputs the correct sequence given an input of a .mat file. However, a user would be inputting either a FASTA file or pasting in a sequence. This script currently accepts the pasted sequence, but the optimizer portion is not working as expected. It outputs a sequence but it is not correct. Since the same model was being inferenced in the run_icor_from_mat script, I have isolated that this issue is not because of the model file. It is because of the encoding done in this script. I have 1-2 things that I still need to try which I believe will solve this issue.
72-
7371
Supporting files were used to train, evaluate, and test the ICOR model. Descriptions for these can be found below:
7472

7573
#### Models
@@ -83,7 +81,7 @@ The ICOR model was trained in the MATLAB environment. For more details on model
8381
8482
#### Optimizers
8583
`brute_force_optimizer.py`
86-
> Naive optimizer creates a directory containing amino acid sequences in the FASTA format and saves these "optimized" / "generated" DNA sequences in a directory. It generates 10,000 sequences and chooses the one with the highest CAI.
84+
> Brute force optimizer creates a directory containing amino acid sequences in the FASTA format and saves these "optimized" / "generated" DNA sequences in a directory. It generates 10,000 sequences and chooses the one with the highest CAI.
8785
8886
`icor_optimizer.py`
8987
> ICOR optimizer outputs a text file given a sequence input of amino acids or DNA. It is an interactive Python command-line script. It runs an inference through the ICOR model.
@@ -114,10 +112,6 @@ The following is a description of the purpose for each script in the repository.
114112
- Negative CIS elements
115113
- Negative repeat elements
116114

117-
`run_icor_from_mat.ipynb`
118-
> A notebook that accepts a `.mat` file that contains one variable called "XTrain" of the cell array type. Cell array used in experiments was of value/shape 42266x1.
119-
> Note: as of 8/24/2021 this script successfully outputs the ICOR optimized sequence and it does indeed match the correct ICOR optimization.
120-
121115
#### Resources
122116
The following is a description of the purpose for each resource in the resources folder.
123117

tool/optimizers/icor_optimizer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ def aa2int(seq: str) -> List[int]:
8787
i = 0
8888
# style nit: more pythonic to write for i in range(0, len(aa_placement)):
8989
while i < len(aa_placement):
90-
oh_array[aa_placement[i]-1, i] = 1
90+
oh_array[aa_placement[i], i] = 1
9191
i += 1
9292

9393
oh_array = [oh_array]

tool/scripts/run_icor_from_mat.ipynb

Lines changed: 0 additions & 119 deletions
This file was deleted.

0 commit comments

Comments
 (0)