Introduction to SNP-based variant calling
This pipeline contains a wrapper for vivaxGEN NGS-Pipeline, a SNP-based variant calling pipeline, which has been setup to process P. vivax sequence data. This pipeline will produce ordinary VCF file that can be used for further downstream analysis.
Currently, this pipeline was setup to use multi-step mode of vivaxGEN NGS-Pipeline in a single command line with GATK-based workflow to generate VCF file There is also Freebayes-based workflow, but it requires modification of the setting, which will not be covered in this documentation.
Quick Start
Running the pipeline is as easy executing the following command:
$ ngs-pl run-discovery-variant-caller -o output_dir path_to_fastq/*.fastq.gz
When the command finishes, examine the content of
output_dirdirectory:
The layout of the output directory is:
$ tree output_dir
output_dir/
metafile/
manifest.tsv
analysis/
SAMPLE-1/
gvcf/
SAMPLE-1-PvP01_all_v2.g.vcf.gz
SAMPLE-1-PvP01_all_v2.g.vcf.gz.tbi
maps/
mapped-final.bam
mapped-final.bam.bai
reads/
raw-0_R1.fastq.gz
raw-0_R2.fastq.gz
logs/
...
SAMPLE-2/
...
joint/
concatenated.vcf.gz
vcfs/
reports/
...
completed_samples/
SAMPLE-1/
...
SAMPLE-2/
...
failed_samples/
...
The primary output file of interest is the concatenated.vcf.gz which contains the SNPs and the genotype calls for each of the samples.