Miniprot is a splice-aware protein alignment program. This CWL workflow takes protein sequences, aligns them to the target genome, and generates JBrowse tracks.
The workflow accepts a YAML file.
threads: 4
singularity_image: /project/nal_genomics/shared_programs/jbrowse_1.16.11--pl5321h9f5acd7_5.sif
gff3_rename_script:
class: File
path: tools/gff3_rename_ID_Parent_attributes.py
species_abbreviation: Apimel
protein_family: chemoreceptors
reference_genome:
class: File
path: /path/to/GCF_003254395.2_Amel_HAv3.1_genomic.fna
protein_sequences:
class: File
path: /path/to/Amel_chemoreceptors.fasta
protein_data_source: "Robertson, H. M., & Wanner, K. W. (2006). The chemoreceptor superfamily in the honey bee, Apis mellifera: expansion of the odorant, but not gustatory, receptor family. Genome research, 16(11), 1395-1403. Smith, C. R., Smith, C. D., Robertson, H. M., Helmkampf, M., Zimin, A., Yandell, M., ... & Gadau, J. (2011). Draft genome of the red harvester ant Pogonomyrmex barbatus. Proceedings of the National Academy of Sciences, 108(14), 5667-5672."
json_track_name: Apis mellifera chemoreceptors
json_directory_name: miniprot_Apimel_chremoreceptors_json
threads is the number of threads for Miniprot
singularity_image path is for Ceres
gff3_rename_script don't change this path
species_abbreviation and protein_family form the trackLabel (the track folder name)
protein_data_source is the citation for the source of the protein sequences
json_track_name is what is displayed in JBrowse
json_directory_name is the name of the output directory
Create the conda environment using the miniprot_env.yml file (once made, it is not necessary to remake the environment).
conda env create -f miniprot_env.yml
To run on Ceres:
module load singularityCE
module load miniconda
conda activate miniprot_env
cwltool miniprot_workflow.cwl miniprot_params.yml