Setup
Prior to running any of the workflows, you must set up the terra table and link reference files and custom python scripts to your workspace data. Below is a table detailing the workspace data you will need to set up.
Workspace Data
The reference files can be found in this repository in the workspace_data directory. Python scripts can be found in the listed repsoitory directory.
| workspace variable name | workflow | file name | description | 
|---|---|---|---|
| adapters_and_contaminants_fa | SC2_illumina_pe_assembly | Adapters_plus_PhiX_174.fasta | adapters sequences and contaminant sequences removed during fastq cleaning and filtering using SeqyClean. Thanks to Erin Young at Utah Public Health Laboratory for providing this file! | 
| covid_genome_gff | SC2_illumina_pe_assembly,SC2_illumina_se_assembly,SC2_ont_assembly | NC_045512-2_reference.gff | whole genome reference sequence annotation file in gff format (we use NCBI genbank ID MN908947.3) | 
| covid_genome_fa | SC2_illumina_pe_assembly,SC2_illumina_se_assembly,SC2_ont_assembly | MN908947-2_reference.fasta | SARS-CoV-2 whole genome reference sequence in fasta format (we use NCBI genbank ID MN908947.3) | 
| artic_v3_bed | SC2_illumina_pe_assembly,SC2_illumina_se_assembly,SC2_ont_assembly | artic_V3_nCoV-2019.primer.bed | primer bed file for the Artic V3 tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file! | 
| artic_v4_bed | SC2_illumina_pe_assembly,SC2_illumina_se_assembly,SC2_ont_assembly | artic_V4_nCoV-2019.primer.bed | primer bed file for the Artic V4 tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file! | 
| artic_v4-1_bed | SC2_illumina_pe_assembly,SC2_illumina_se_assembly,SC2_ont_assembly | artic_V4-1_nCoV-2019.primer.bed | primer bed file for the Artic V4.1 tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file! | 
| artic_v4-1_s_gene_amplicons | SC2_illumina_pe_assembly,SC2_ont_assembly | artic_v4_1_s_gene_amplicons.tsv | |
| artic_v4-1_s_gene_primer_bed | SC2_illumina_pe_assembly,SC2_ont_assembly | S_gene_V4-1_nCoV-2021.primer.bed | |
| midnight_bed | SC2_ont_assembly | Midnight_Primers_SARS-CoV-2.scheme.bed | primer bed file for the Midnight tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file! | 
| covid_voc_annotations_tsv | SC2_wastewater_variant_calling workflow | SC2_voc_annotations_20220711.tsv | For wastewater only. List of amino acid (AA) substitutions and lineages containing those AA substitutions; for a lineage to be associated with a given AA substitution, 90% of publicly available sequences must contain the AA substitution (the 90% cutoff was determined using outbreak.info) | 
| covid_voc_bed_tsv | SC2_wastewater_variant_calling workflow. | SC2_voc_mutations_20220711.tsv | For wastewater only. List of nucleotide genome positions in relation to the MN908947.3 reference genome of know mutations | 
| covid_calc_per_cov_py | SC2_illumina_pe_assembly,SC2_illumina_se_assembly,SC2_ont_assembly | calc_percent_coverage.py | see detailed description in the readme file found in ./python_scripts/repo directory | 
| covid_nextclade_json_parser_py | SC2_lineage_calling_and_results | nextclade_json_parser.py | see detailed description in the readme file found in ./python_scripts/repo directory | 
| covid_concat_results_py | SC2_lineage_calling_and_results | concat_seq_metrics_and_lineages_results.py | see detailed description in the readme file found in ./python_scriptsrepo directory |