Skip to content

Tutorial 3 Marker Matching and Alignment

chutter edited this page Apr 23, 2019 · 5 revisions

*** STILL A WORK IN PROGRESS

This tutorial will guide you through matching assembled contigs to the target markers and next aligning each marker.

Target Marker Matching [R script 03_Probe_Matching.R]

Insert explanation here!!


##########################################################################################################
#Parameter setups. Only edit values here. 
##########################################################################################################

#This script does the following:
#1. Matches the loci to the contigs, saves them to a new file
#2. Also finds the potential paralogs, removes them, and saves them to a separate file

#Set up directories
threads<-"6" #threads, keep quotes
contig.save<-"Project_Name"  #This is your save name for the big contig match file

#CLUSTER directories
work.dir<-"/home/username/Main_Project_Directory" #Your main project directory 
proc.dir<-"/home/username/Main_Project_Directory/Processed_Samples"
contig.dir<-"/home/username/Main_Project_Directory/Assembled_Contigs"
loci.file<-"/home/username/Main_Project_Directory/SELECT_YOUR_MARKER_FILE.fa"

When finished editing, run the script using the "Rscript" terminal command which installs alongside R. This can be done manually in the terminal window or added as the command in a cluster file.


> Rscript 01_Pre_Process_Reads.R

Show example of output files and explain them.

Target Marker Alignment [04_Marker_Alignment.R]

Insert explanation here for the next section


##########################################################################################################
#Parameter setups. Only edit values here. 
##########################################################################################################

#Parameters
threads<-"8"
min.taxa<-3 #min number taxa to keep an alignment

#Cluster dirs
species.loci<-"CONTIG-FILE-FROM-PREVIOUS-STEP.fa"  #The contig file output by the previous script 
work.dir<-"/home/username/Main_Project_Directory"
loci.file<-"/home/username/Main_Project_Directory/SELECT_YOUR_MARKER_FILE.fa" #Target loci file
out.dir<-"/home/username/Main_Project_Directory/Alignments" #The name of the output directory


When finished editing, run the script using the "Rscript" terminal command which installs alongside R. This can be done manually in the terminal window or added as the command in a cluster file.


> Rscript 01_Pre_Process_Reads.R

Show example of output files and explain them.

Finally, before running the FrogCap pipeline scripts, the input files must be organized in a specific way. First, create a directory and name it after the project name. Second, put the newly created "File_rename.csv", the downloaded and unzipped folder of "FrogCap_Files", and finally the demultiplexed raw reads in a folder named "raw_data". An example of the file structure is shown below.

     /Project_Name
      ├── /raw_data
      │   ├── Spinomantis_elegans_CRH111_R1.fastq.gz
      │   ├── Spinomantis_elegans_CRH111_R2.fastq.gz
      │   ├── Boophis_burgeri_CRH0481_R1.fastq.gz
      │   ├── Boophis_burgeri_CRH0481_R2.fastq.gz
      │   ├── Aglyptodactylus_securifer_CRH1644_R1.fastq.gz
      │   ├── Aglyptodactylus_securifer_CRH1644_R2.fastq.gz
      │   ├── Mantidactylus_femoralis_CRH2340_R1.fastq.gz
      │   └── Mantidactylus_femoralis_CRH2340_R2.fastq.gz
      ├── File_rename.csv
      └── /FrogCap_Files

*/ denotes directory

When finished editing, run the script using the "Rscript" terminal command which installs alongside R. This can be done manually in the terminal window or added as the command in a cluster file.

The Assembly run generated a statistics file in the "reports" folder and a new folder "assembled-contigs". This folder has two new files: the "_contigs.fa" file are the Spades assembled contigs while the "_dipcontigs.fa" file are the regular contigs where contigs from the same haplotype are merged together to create a "haplocontig".

These two scripts can be run on any number of samples in any combination. However, when done with Tutorial 1, you will need to select your definitive sample set for alignment for Tutorial 2.

Yaaasss

Clone this wiki locally