Setup
Prior to running any of the workflows, you must set up the terra table and link reference files and custom python scripts to your workspace data. Below is a table detailing the workspace data you will need to set up.
Workspace Data
The reference files can be found in this repository in the workspace_data
directory. Python scripts can be found in the listed repsoitory directory.
workspace variable name | workflow | file name | description |
---|---|---|---|
adapters_and_contaminants_fa |
SC2_illumina_pe_assembly |
Adapters_plus_PhiX_174.fasta | adapters sequences and contaminant sequences removed during fastq cleaning and filtering using SeqyClean. Thanks to Erin Young at Utah Public Health Laboratory for providing this file! |
covid_genome_gff |
SC2_illumina_pe_assembly , SC2_illumina_se_assembly , SC2_ont_assembly |
NC_045512-2_reference.gff | whole genome reference sequence annotation file in gff format (we use NCBI genbank ID MN908947.3) |
covid_genome_fa |
SC2_illumina_pe_assembly , SC2_illumina_se_assembly , SC2_ont_assembly |
MN908947-2_reference.fasta | SARS-CoV-2 whole genome reference sequence in fasta format (we use NCBI genbank ID MN908947.3) |
artic_v3_bed |
SC2_illumina_pe_assembly , SC2_illumina_se_assembly , SC2_ont_assembly |
artic_V3_nCoV-2019.primer.bed | primer bed file for the Artic V3 tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file! |
artic_v4_bed |
SC2_illumina_pe_assembly , SC2_illumina_se_assembly , SC2_ont_assembly |
artic_V4_nCoV-2019.primer.bed | primer bed file for the Artic V4 tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file! |
artic_v4-1_bed |
SC2_illumina_pe_assembly , SC2_illumina_se_assembly , SC2_ont_assembly |
artic_V4-1_nCoV-2019.primer.bed | primer bed file for the Artic V4.1 tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file! |
artic_v4-1_s_gene_amplicons |
SC2_illumina_pe_assembly , SC2_ont_assembly |
artic_v4_1_s_gene_amplicons.tsv | |
artic_v4-1_s_gene_primer_bed |
SC2_illumina_pe_assembly , SC2_ont_assembly |
S_gene_V4-1_nCoV-2021.primer.bed | |
midnight_bed |
SC2_ont_assembly |
Midnight_Primers_SARS-CoV-2.scheme.bed | primer bed file for the Midnight tiled amplicon primer set. Thanks to Theiagen Genomics for providing this file! |
covid_voc_annotations_tsv |
SC2_wastewater_variant_calling workflow |
SC2_voc_annotations_20220711.tsv | For wastewater only. List of amino acid (AA) substitutions and lineages containing those AA substitutions; for a lineage to be associated with a given AA substitution, 90% of publicly available sequences must contain the AA substitution (the 90% cutoff was determined using outbreak.info) |
covid_voc_bed_tsv |
SC2_wastewater_variant_calling workflow . |
SC2_voc_mutations_20220711.tsv | For wastewater only. List of nucleotide genome positions in relation to the MN908947.3 reference genome of know mutations |
covid_calc_per_cov_py |
SC2_illumina_pe_assembly , SC2_illumina_se_assembly , SC2_ont_assembly |
calc_percent_coverage.py | see detailed description in the readme file found in ./python_scripts/ repo directory |
covid_nextclade_json_parser_py |
SC2_lineage_calling_and_results |
nextclade_json_parser.py | see detailed description in the readme file found in ./python_scripts/ repo directory |
covid_concat_results_py |
SC2_lineage_calling_and_results |
concat_seq_metrics_and_lineages_results.py | see detailed description in the readme file found in ./python_scripts repo directory |