Setup

Prior to running any of the workflows, you must set up the terra table and link reference files and custom python scripts to your workspace data. Below is a table detailing the workspace data you will need to set up.

Workspace Data

The reference files can be found in this repository in the workspace_data directory. Python scripts can be found in the listed repsoitory directory.

workspace variable name	workflow	file name	description
`adapters_and_contaminants_fa`	`SC2_illumina_pe_assembly`	Adapters_plus_PhiX_174.fasta	adapters sequences and contaminant sequences removed during fastq cleaning and filtering using SeqyClean. Thanks to Erin Young at Utah Public Health Laboratory for providing this file!
`covid_genome_gff`	`SC2_illumina_pe_assembly`, `SC2_illumina_se_assembly`, `SC2_ont_assembly`	NC_045512-2_reference.gff	whole genome reference sequence annotation file in gff format (we use NCBI genbank ID MN908947.3)
`covid_genome_fa`	`SC2_illumina_pe_assembly`, `SC2_illumina_se_assembly`, `SC2_ont_assembly`	MN908947-2_reference.fasta	SARS-CoV-2 whole genome reference sequence in fasta format (we use NCBI genbank ID MN908947.3)
`artic_v3_bed`	`SC2_illumina_pe_assembly`, `SC2_illumina_se_assembly`, `SC2_ont_assembly`	artic_V3_nCoV-2019.primer.bed	primer bed file for the Artic V3 tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file!
`artic_v4_bed`	`SC2_illumina_pe_assembly`, `SC2_illumina_se_assembly`, `SC2_ont_assembly`	artic_V4_nCoV-2019.primer.bed	primer bed file for the Artic V4 tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file!
`artic_v4-1_bed`	`SC2_illumina_pe_assembly`, `SC2_illumina_se_assembly`, `SC2_ont_assembly`	artic_V4-1_nCoV-2019.primer.bed	primer bed file for the Artic V4.1 tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file!
`artic_v4-1_s_gene_amplicons`	`SC2_illumina_pe_assembly`, `SC2_ont_assembly`	artic_v4_1_s_gene_amplicons.tsv
`artic_v4-1_s_gene_primer_bed`	`SC2_illumina_pe_assembly`, `SC2_ont_assembly`	S_gene_V4-1_nCoV-2021.primer.bed
`midnight_bed`	`SC2_ont_assembly`	Midnight_Primers_SARS-CoV-2.scheme.bed	primer bed file for the Midnight tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file!
`covid_voc_annotations_tsv`	`SC2_wastewater_variant_calling workflow`	SC2_voc_annotations_20220711.tsv	For wastewater only. List of amino acid (AA) substitutions and lineages containing those AA substitutions; for a lineage to be associated with a given AA substitution, 90% of publicly available sequences must contain the AA substitution (the 90% cutoff was determined using outbreak.info)
`covid_voc_bed_tsv`	`SC2_wastewater_variant_calling workflow`.	SC2_voc_mutations_20220711.tsv	For wastewater only. List of nucleotide genome positions in relation to the MN908947.3 reference genome of know mutations
`covid_calc_per_cov_py`	`SC2_illumina_pe_assembly`, `SC2_illumina_se_assembly`, `SC2_ont_assembly`	calc_percent_coverage.py	see detailed description in the readme file found in `./python_scripts/` repo directory
`covid_nextclade_json_parser_py`	`SC2_lineage_calling_and_results`	nextclade_json_parser.py	see detailed description in the readme file found in `./python_scripts/` repo directory
`covid_concat_results_py`	`SC2_lineage_calling_and_results`	concat_seq_metrics_and_lineages_results.py	see detailed description in the readme file found in `./python_scripts` repo directory