-
Notifications
You must be signed in to change notification settings - Fork 2
Tutorial 3 Marker Matching and Alignment
*** STILL A WORK IN PROGRESS
This tutorial will guide you through matching assembled contigs to the target markers and next aligning each marker.
Insert explanation here!!
##########################################################################################################
#Parameter setups. Only edit values here.
##########################################################################################################
#This script does the following:
#1. Matches the loci to the contigs, saves them to a new file
#2. Also finds the potential paralogs, removes them, and saves them to a separate file
#Set up directories
threads<-"6" #threads, keep quotes
contig.save<-"Project_Name" #This is your save name for the big contig match file
#CLUSTER directories
work.dir<-"/home/username/Main_Project_Directory" #Your main project directory
proc.dir<-"/home/username/Main_Project_Directory/Processed_Samples"
contig.dir<-"/home/username/Main_Project_Directory/Assembled_Contigs"
loci.file<-"/home/username/Main_Project_Directory/SELECT_YOUR_MARKER_FILE.fa"
When finished editing, run the script using the "Rscript" terminal command which installs alongside R. This can be done manually in the terminal window or added as the command in a cluster file.
> Rscript 01_Pre_Process_Reads.R
Show example of output files and explain them.
Insert explanation here for the next section
##########################################################################################################
#Parameter setups. Only edit values here.
##########################################################################################################
#Parameters
threads<-"8"
min.taxa<-3 #min number taxa to keep an alignment
#Cluster dirs
species.loci<-"CONTIG-FILE-FROM-PREVIOUS-STEP.fa" #The contig file output by the previous script
work.dir<-"/home/username/Main_Project_Directory"
loci.file<-"/home/username/Main_Project_Directory/SELECT_YOUR_MARKER_FILE.fa" #Target loci file
out.dir<-"/home/username/Main_Project_Directory/Alignments" #The name of the output directory
When finished editing, run the script using the "Rscript" terminal command which installs alongside R. This can be done manually in the terminal window or added as the command in a cluster file.
> Rscript 01_Pre_Process_Reads.R
Show example of output files and explain them.
Finally, before running the FrogCap pipeline scripts, the input files must be organized in a specific way. First, create a directory and name it after the project name. Second, put the newly created "File_rename.csv", the downloaded and unzipped folder of "FrogCap_Files", and finally the demultiplexed raw reads in a folder named "raw_data". An example of the file structure is shown below.
/Project_Name
├── /raw_data
│ ├── Spinomantis_elegans_CRH111_R1.fastq.gz
│ ├── Spinomantis_elegans_CRH111_R2.fastq.gz
│ ├── Boophis_burgeri_CRH0481_R1.fastq.gz
│ ├── Boophis_burgeri_CRH0481_R2.fastq.gz
│ ├── Aglyptodactylus_securifer_CRH1644_R1.fastq.gz
│ ├── Aglyptodactylus_securifer_CRH1644_R2.fastq.gz
│ ├── Mantidactylus_femoralis_CRH2340_R1.fastq.gz
│ └── Mantidactylus_femoralis_CRH2340_R2.fastq.gz
├── File_rename.csv
└── /FrogCap_Files
*/ denotes directory
When finished editing, run the script using the "Rscript" terminal command which installs alongside R. This can be done manually in the terminal window or added as the command in a cluster file.
The Assembly run generated a statistics file in the "reports" folder and a new folder "assembled-contigs". This folder has two new files: the "_contigs.fa" file are the Spades assembled contigs while the "_dipcontigs.fa" file are the regular contigs where contigs from the same haplotype are merged together to create a "haplocontig".
These two scripts can be run on any number of samples in any combination. However, when done with Tutorial 1, you will need to select your definitive sample set for alignment for Tutorial 2.
Yaaasss