Setup

Prior to running any of the workflows, you must set up the terra table and link reference files and custom python scripts to your workspace data. Below is a table detailing the workspace data you will need to set up.

Workspace Data

The reference files can be found in this repository in the workspace_data directory. Python scripts can be found in the listed repsoitory directory.

workspace variable name workflow file name description
adapters_and_contaminants_fa SC2_illumina_pe_assembly Adapters_plus_PhiX_174.fasta adapters sequences and contaminant sequences removed during fastq cleaning and filtering using SeqyClean. Thanks to Erin Young at Utah Public Health Laboratory for providing this file!
covid_genome_gff SC2_illumina_pe_assembly, SC2_illumina_se_assembly, SC2_ont_assembly NC_045512-2_reference.gff whole genome reference sequence annotation file in gff format (we use NCBI genbank ID MN908947.3)
covid_genome_fa SC2_illumina_pe_assembly, SC2_illumina_se_assembly, SC2_ont_assembly MN908947-2_reference.fasta SARS-CoV-2 whole genome reference sequence in fasta format (we use NCBI genbank ID MN908947.3)
artic_v3_bed SC2_illumina_pe_assembly, SC2_illumina_se_assembly, SC2_ont_assembly artic_V3_nCoV-2019.primer.bed primer bed file for the Artic V3 tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file!
artic_v4_bed SC2_illumina_pe_assembly, SC2_illumina_se_assembly, SC2_ont_assembly artic_V4_nCoV-2019.primer.bed primer bed file for the Artic V4 tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file!
artic_v4-1_bed SC2_illumina_pe_assembly, SC2_illumina_se_assembly, SC2_ont_assembly artic_V4-1_nCoV-2019.primer.bed primer bed file for the Artic V4.1 tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file!
artic_v4-1_s_gene_amplicons SC2_illumina_pe_assembly, SC2_ont_assembly artic_v4_1_s_gene_amplicons.tsv
artic_v4-1_s_gene_primer_bed SC2_illumina_pe_assembly, SC2_ont_assembly S_gene_V4-1_nCoV-2021.primer.bed
midnight_bed SC2_ont_assembly Midnight_Primers_SARS-CoV-2.scheme.bed primer bed file for the Midnight tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file!
covid_voc_annotations_tsv SC2_wastewater_variant_calling workflow SC2_voc_annotations_20220711.tsv For wastewater only. List of amino acid (AA) substitutions and lineages containing those AA substitutions; for a lineage to be associated with a given AA substitution, 90% of publicly available sequences must contain the AA substitution (the 90% cutoff was determined using outbreak.info)
covid_voc_bed_tsv SC2_wastewater_variant_calling workflow. SC2_voc_mutations_20220711.tsv For wastewater only. List of nucleotide genome positions in relation to the MN908947.3 reference genome of know mutations
covid_calc_per_cov_py SC2_illumina_pe_assembly, SC2_illumina_se_assembly, SC2_ont_assembly calc_percent_coverage.py see detailed description in the readme file found in ./python_scripts/ repo directory
covid_nextclade_json_parser_py SC2_lineage_calling_and_results nextclade_json_parser.py see detailed description in the readme file found in ./python_scripts/ repo directory
covid_concat_results_py SC2_lineage_calling_and_results concat_seq_metrics_and_lineages_results.py see detailed description in the readme file found in ./python_scripts repo directory