===VirGA HSV-KOSsyn Report Summary===
Date generated: Mon May 5 16:41:16 EDT 2014

Genome name HSV-KOSsyn.fa
Genome annotation HSV-KOSsyn.gff
Genome length 135683 bp
Genome name (with TRL/TRS)full-length_HSV-KOSsyn.fa
Genome annotation (with TRL/TRS)full-length_HSV-KOSsyn.gff
Genome length (with TRL/TRS) 150970 bp
Number of bases with >=100 depth support      133371 (98.29%)
Number of gaps 3
Number of Ns 721
Complete protein annotations 71
Incomplete protein annotations 3
Reference name HSV-17_trimmed.fa
Reference annotation HSV-17_trimmed.gff


Coverage plot demonstrating read depth across the new genome
All annotations from the .gff file have been added as separate tracks, as well as low coverage and gap regions. (click to enlarge)
Coverage Graphic

The following proteins were correctly assembled and annotated from the HSV-KOSsyn genome
(Click on the protein names to see their alignment to the reference genome)
gene_RL1: Gene|ORF|AA   No errors     gene_RL2: Gene|ORF|AA   No errors     gene_UL1: Gene|ORF|AA   No errors     
gene_UL2: Gene|ORF|AA   No errors     gene_UL3: Gene|ORF|AA   No errors     gene_UL4: Gene|ORF|AA   No errors     
gene_UL5: Gene|ORF|AA   No errors     gene_UL6: Gene|ORF|AA   No errors     gene_UL7: Gene|ORF|AA   No errors     
gene_UL8: Gene|ORF|AA   No errors     gene_UL9: Gene|ORF|AA   No errors     gene_UL10: Gene|ORF|AA   No errors     
gene_UL11: Gene|ORF|AA   No errors     gene_UL12: Gene|ORF|AA   No errors     gene_UL13: Gene|ORF|AA   No errors     
gene_UL14: Gene|ORF|AA   No errors     gene_UL15: Gene|ORF|AA   No errors     gene_UL16: Gene|ORF|AA   No errors     
gene_UL17: Gene|ORF|AA   No errors     gene_UL18: Gene|ORF|AA   No errors     gene_UL19: Gene|ORF|AA   No errors     
gene_UL20: Gene|ORF|AA   No errors     gene_UL21: Gene|ORF|AA   No errors     gene_UL22: Gene|ORF|AA   No errors     
gene_UL23: Gene|ORF|AA   No errors     gene_UL24: Gene|ORF|AA   No errors     gene_UL25: Gene|ORF|AA   No errors     
gene_UL26: Gene|ORF|AA   No errors     gene_UL26.5: Gene|ORF|AA   No errors     gene_UL27: Gene|ORF|AA   No errors     
gene_UL28: Gene|ORF|AA   No errors     gene_UL29: Gene|ORF|AA   No errors     gene_UL30: Gene|ORF|AA   No errors     
gene_UL31: Gene|ORF|AA   No errors     gene_UL32: Gene|ORF|AA   No errors     gene_UL33: Gene|ORF|AA   No errors     
gene_UL34: Gene|ORF|AA   No errors     gene_UL35: Gene|ORF|AA   No errors     gene_UL36: Gene|ORF|AA   No errors     
gene_UL37: Gene|ORF|AA   No errors     gene_UL38: Gene|ORF|AA   No errors     gene_UL39: Gene|ORF|AA   No errors     
gene_UL40: Gene|ORF|AA   No errors     gene_UL41: Gene|ORF|AA   No errors     gene_UL42: Gene|ORF|AA   No errors     
gene_UL43: Gene|ORF|AA   No errors     gene_UL44: Gene|ORF|AA   No errors     gene_UL45: Gene|ORF|AA   No errors     
gene_UL46: Gene|ORF|AA   No errors     gene_UL47: Gene|ORF|AA   No errors     gene_UL48: Gene|ORF|AA   No errors     
gene_UL49: Gene|ORF|AA   No errors     gene_UL49A: Gene|ORF|AA   No errors     gene_UL50: Gene|ORF|AA   No errors     
gene_UL51: Gene|ORF|AA   No errors     gene_UL52: Gene|ORF|AA   No errors     gene_UL53: Gene|ORF|AA   No errors     
gene_UL54: Gene|ORF|AA   No errors     gene_UL55: Gene|ORF|AA   No errors     gene_UL56: Gene|ORF|AA   No errors     
gene_US1: Gene|ORF|AA   No errors     gene_US2: Gene|ORF|AA   No errors     gene_US3: Gene|ORF|AA   No errors     
gene_US4: Gene|ORF|AA   No errors     gene_US5: Gene|ORF|AA   No errors     gene_US6: Gene|ORF|AA   No errors     
gene_US7: Gene|ORF|AA   No errors     gene_US8: Gene|ORF|AA   No errors     gene_US10: Gene|ORF|AA   No errors     
gene_US11: Gene|ORF|AA   No errors     gene_US12: Gene|ORF|AA   No errors     

Proteins requiring review due to potential assembly defects or substantial biological variation:
gene_RS1: Gene|ORF|AA      (Gaps found)
gene_US8A: Gene|ORF|AA      (Stop codon missing)
gene_US9: Gene|ORF|AA      (Sample is more than 10% shorter than the reference) (Early stop codon found)

The following genome features are non-protein coding and were aligned to the reference for visual comparison
(Click the feature names to see their alignment to the reference. Only features <= 5kb were aligned.)
gene_LAT: Nonegene_US12_start_in_IRS: DNAhsv1-mir-H1: DNA
hsv1-mir-H2: DNAhsv1-mir-H3: DNAhsv1-mir-H4: DNA
hsv1-mir-H5: DNAhsv1-mir-H6: DNAhsv1-mir-H7: DNA
hsv1-mir-H8: DNAhsv1-mir-H11: DNAhsv1-mir-H12: DNA
hsv1-mir-H13: DNAhsv1-mir-H14: DNAhsv1-mir-H16: DNA
hsv1-mir-H18: DNAhsv1-mir-H26: DNAhsv1-mir-H27: DNA
oriL: DNAoriS: DNAUL: None
US: NoneIRL: NoneIRS: None
a_prime: DNA


===VirGA Detailed Report===

VirGA steps completed:

   - STEP_1:Preprocessing
   - STEP_2:Multi_SSAKE_Assembly
   - STEP_3:Annotation
   - STEP_4:Assembly_Assessment

STEP_1:Preprocessing Output


Raw reads within sample1 (HSV-KOSsyn_1.fastq ): 22860146
Raw reads within sample2 (HSV-KOSsyn_2.fastq ): 22860146

Procedure: 'before_stats'
Purpose: Create histograms of raw read quality and other metrics
Raw_reads_1
Raw_reads_1
FastQC Results for HSV-KOSsyn_1.fastq : FastQC-Report
FastQC Results for HSV-KOSsyn_2.fastq : FastQC-Report

Procedure: 'fastx_clip_adapters'
Purpose: Clip leftover adapters from the raw sequencing reads


Clipping Adapter: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
Min. Length: 30
Input: 22860146 reads.
Output: 21860165 reads.
discarded 892615 too-short reads.
discarded 49631 adapter-only reads.
discarded 57735 N reads.
Clipping Adapter: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
Min. Length: 30
Input: 22860146 reads.
Output: 21761541 reads.
discarded 878536 too-short reads.
discarded 48072 adapter-only reads.
discarded 171997 N reads.



Procedure: 'fastx_trim_qual'
Purpose: Trim low quality bases from sequencing reads


TrimmomaticSE: Started with arguments: -phred33 /gpfs/home/jus57/biostar/1--Raw_Reads_Per_Genome/HSV-KOSsyn/VirGA_Pipeline_Directory/STEP_1--Preprocessing/clipped_reads/clip_HSV-KOSsyn_1.fastq /gpfs/home/jus57/biostar/1--Raw_Reads_Per_Genome/HSV-KOSsyn/VirGA_Pipeline_Directory/STEP_1--Preprocessing/trimmed_reads/trim_clip_HSV-KOSsyn_1.fastq SLIDINGWINDOW:15:30 MINLEN:30
Automatically using 16 threads
Input Reads: 21860165 Surviving: 20401833 (93.33%) Dropped: 1458332 (6.67%)
TrimmomaticSE: Completed successfully
TrimmomaticSE: Started with arguments: -phred33 /gpfs/home/jus57/biostar/1--Raw_Reads_Per_Genome/HSV-KOSsyn/VirGA_Pipeline_Directory/STEP_1--Preprocessing/clipped_reads/clip_HSV-KOSsyn_2.fastq /gpfs/home/jus57/biostar/1--Raw_Reads_Per_Genome/HSV-KOSsyn/VirGA_Pipeline_Directory/STEP_1--Preprocessing/trimmed_reads/trim_clip_HSV-KOSsyn_2.fastq SLIDINGWINDOW:15:30 MINLEN:30
Automatically using 16 threads
Input Reads: 21761541 Surviving: 19466503 (89.45%) Dropped: 2295038 (10.55%)
TrimmomaticSE: Completed successfully



Procedure: 'fastx_remove_artifacts'
Purpose: Remove sequencing artifacts from the read pool


Input: 20401833 reads.
Output: 20401358 reads.
discarded 475 (0%) artifact reads.
Input: 19466503 reads.
Output: 19466110 reads.
discarded 393 (0%) artifact reads.



Procedure: 'filter_contaminants'
Purpose: Remove reads that are derived from a contaminating source, e.g. host cells


20401358 reads; of these:
20401358 (100.00%) were unpaired; of these:
20266145 (99.34%) aligned 0 times
100064 (0.49%) aligned exactly 1 time
35149 (0.17%) aligned >1 times
0.66% overall alignment rate
19466110 reads; of these:
19466110 (100.00%) were unpaired; of these:
19333144 (99.32%) aligned 0 times
98691 (0.51%) aligned exactly 1 time
34275 (0.18%) aligned >1 times
0.68% overall alignment rate

Removed 135213 contaminate reads

Removed 132966 contaminate reads



Procedure: 'only_save_paired'
Purpose: Only save reads that are properly paired


R1 read count: 20266145
R2 read count: 19333144
Total read count: 39599289
Singletons: 3898177
Properly paired reads: 17850556


Procedure: 'after_stats'
Purpose: Create histograms of raw read quality and other metrics
Raw_reads_1
Raw_reads_1
FastQC Results for HSV-KOSsyn_1.fastq : FastQC-Report
FastQC Results for HSV-KOSsyn_2.fastq : FastQC-Report

STEP_2:Multi_SSAKE_Assembly


Procedure: 'multi_ssake'
Purpose: Iteratively run SSAKE de bruijn graph generations to assemble rough contigs

Total # of contigs: 5442
Average contig length: 439
Longest contig: 34782
Shortest contig: 63
N20: 10970
N50: 934
N80: 198
Number of contigs >= N50: 248
Total # of bases: 2391750


Procedure: 'use_celera'
Purpose: Use celera to assemble rough contigs into high quality contigs

Total # of contigs: 32
Average contig length: 4735
Longest contig: 127984
Shortest contig: 101
N20: 127984
N50: 127984
N80: 127984
Number of contigs >= N50: 1
Total # of bases: 151530

STEP_3:Annotation


Procedure: 'mugsy_maf-net_compare_genome'
Purpose: Align quality contigs with reference to produce linearized genome assembly

Assembled_genome contains 918 Ns
Assembled_genome contains 135798 total bases

Procedure: 'use_gapfiller'
Purpose: Fill gaps in the assembled genome using paired-end read data

Closed 2 out of 3 gaps
Closed 261 out of 974 nucleotides
Closed 0 out of 1 gaps
Closed 0 out of 713 nucleotides


Assembled_genome after GapFiller contains 717 Ns
Assembled_genome after GapFiller contains 135680 total bases

STEP_4:Assembly_Assessment


Procedure: 'Bowtie2_map'
Purpose: Align reads back to the new genome assembly using Bowtie2


17850556 reads; of these:
17850556 (100.00%) were paired; of these:
103362 (0.58%) aligned concordantly 0 times
17710588 (99.22%) aligned concordantly exactly 1 time
36606 (0.21%) aligned concordantly >1 times
----
103362 pairs aligned concordantly 0 times; of these:
37330 (36.12%) aligned discordantly 1 time
----
66032 pairs aligned 0 times concordantly or discordantly; of these:
132064 mates make up the pairs; of these:
95783 (72.53%) aligned 0 times
35502 (26.88%) aligned exactly 1 time
779 (0.59%) aligned >1 times
99.73% overall alignment rate
[samopen] SAM header is present: 1 sequences.
[bam_sort_core] merging from 15 files...

Procedure: 'call_variants'
Purpose: Find the variants (INDELS/SNPs) in the newly assembled genomes

SAMTools indels called: 0
SAMTools SNPs called: 2
Freebayes indels called: 34
Freebayes SNPs called: 51

Procedure: 'filter_false_SNPs'
Purpose: Replace misassembled bases from the genome that are found as false SNPs

Number of false SNPs corrected in the genome: 26


Procedure: 'create_low_coverage_gff'
Purpose: Generate a .gff file that reports areas of the genome with coverage lower than a threshold
low_coverage.gff

Threshold for determining low coverage: 15


Procedure: 'create_no_coverage_gff'
Purpose: Generate a .gff file that reports areas of the genome that contain gaps
no_coverage.gff

===End Report===