Results Workflows

The following workflows will perform clade and lineage assignment using Nextclade and Pangolin, respectively. The SC2_lineage_calling_and_results.wdl should be run following assembly with one of the three above reference-based assembly workflows. The SC2_mulitfasta_lineage_calling can be run on any multifasta file and is not dependent on any of the above workflows.

Lineage Calling and Results Workflow

File: SC2_lineage_calling_and_results.wdl

This workflow should be run following assembly with one of the three reference based assembly workflows. The workflow accepts "sample_set" as the root entity type and uses the data table from any of the three assembly workflows. All three assembly workflows (illumina pe, illumina se, and ont) are compatible with this workflow. Breifly the workflow performs the following:

concatenates consensus sequences from the sequencing run into a single fasta file
runs panoglin on the concatenated fasta file
runs nextclade on the concatenated fasta file
parses the nextclade json file output using the nextclade_json_parser.py script which pulls out clade and nucleotide and animo acid changes information and converts it to a tabular format.
Concaenates sequencing assembly metrics (e.g. percent coverage, assembler version), lineage and clade information, and sequence metadata (e.g. plate name, sample well location) into a single csv file.
Generates a csv file with sequencing assembly metrics and lineage information that can be used to parse sequencing data into our LIMS.
Transfers intermediate and summary files to a user defined google bucket.

SC2_lineage_calling_and_results.wdl workflow diagram

Inputs

Below is a summary of the workflow input variables along with the syntax used for the attribute column when setting up the workflow to run on Terra.bio. For the attributes, the "this.sample{terra_datatable_name}s." syntax refers Terra to pull the variable from the terra datatable as used for sample sets. These variables were either in the original terra datatable as inputs for the assembly workflow (see referece based assembly workflow inputs sections above for more details) or added as outputs during the assemlby workflow (see reference based assembly workflow outputs sections for more details). The "workspace." syntax refers Terra to pull the variable from the terra workspace data. Workspace data is describe in the Getting Started drop down menu above.

workflow variable	attribute (input syntax into workflow)
`assembler_version_array`	this.sample{terra_datatable_name}s.assembler_version
`concat_seq_results_py`	workspace.covid_concat_results_py
`cov_out`	this.sample{terra_datatable_name}s.cov_out
`nextclade_json_parser_py`	workspace.covid_nextclade_json_parser_py
`out_dir_array`	this.sample{terra_datatable_name}s.out_dir
`percent_cvg_csv`	this.sample{terra_datatable_name}s.percent_cvg_csv
`project_name_array`	this.sample{terra_datatable_name}s.project_name
`renamed_consensus`	this.sample{terra_datatable_name}s.renamed_consesnus
`sample_name`	this.sample{terra_datatable_name}s.sample{terra_datatable_name}_id
`workbook_path_array`	this.sample{terra_datatable_name}s.workbook_path

Outputs

This workflow generates several output files which are transfered to the user defined user google bucket as defined by this.sample{terra_datatable_name}s.out_dir. The table below details each output. For more detailed regarding the values in each column for the outputs see either the software readmes or the readme for the specific python script as listed in the description.

output variable name	file_name	description	google bucket path
`cat_fastas`	`concatenate_assemblies.fasta`	all consesnus sequences from assembly in a single fasta file	`gs://{user_defined_gcp_bucket}/multifasta/`
`nextclade_clades_csv`	`{seq_run}_nextclade_results.csv`	csv file generated from the `nextclade_json_parser.py` script detailing the clade for each seqeunce	`gs://{user_defined_gcp_bucket}/nextclade_out/`
`nextclade_csv`	`nextclade.csv`	csv file generated from nextclade	`gs://{user_defined_gcp_bucket}/nextclade_out/`
`nextclade_json`	`nextclade.json`	json file generated from nextclade; this json file is parsed using the `nextclade_json_parser.py` script and key info is pulled out and converted into a tablular format in the `nextclade_clades_csv`, `nextclade_variants_csv` and `sequencing_results.csv` files (see the readme for the `nextclade_json_parser.py` script for more details)	`gs://{user_defined_gcp_bucket}/nextclade_out/`
`nextclade_variants_csv`	`{seq_run}_nextclade_variant_summary.csv`	csv file generated from the `nextclade_json_parser.py` script detailing the nucleotide and amino acid changes for each seqeunce	`gs://{user_defined_gcp_bucket}/summary_results/`
`nextclade_version`	N/A	version of nextclade	N/A
`pangolin_lineage`	`pangolin_lineage_report.csv`	lineage report generated from pangolin	`gs://{user_defined_gcp_bucket}/pangolin/`
`pangolin_version`	N/A	version of panoglin	N/A
`sequencing_results_csv`	`{seq_run}_sequencing_results.csv`	summary of the sequencing metrics and lineage/clade assignments for each sequence generated from the `concat_seq_metrics_and_lineage_results.py` script. see the `concat_seq_metrics_and_lineage_results.py` readme for more details.	`gs://{user_defined_gcp_bucket}/summary_results/`
`wgs_horizon_report_csv`	`{seq_run}_wgs_horizon_report.csv`	results csv used for parsing results into our LIMS. This file is generated from the `concat_seq_metrics_and_lineage_results.py` script. see the `concat_seq_metrics_and_lineage_results.py` readme for more details.	`gs://{user_defined_gcp_bucket}/summary_results/`

Multifasta Lineage Calling and Results Workflow

File: SC2_multifasta_lineage_calling.wdl

This workflow will perfrom lineage and clade assignment using a concatenated fasta file as input. This workflow is a stand alone workflow and does not depend on any of the previous reference-based assembly workflows. The workflow accepts "sample" as the root entity type and uses a simple two column data table (see inputs below).

Brienfly this workflow will perform the following:

Run nextclade
Run panoglin
transfer outputs to a user defined google bucket.

SC2_multifasta_lineage_calling.wdl workflow diagram

Inputs

Terra datatable. You will need to create a terra data that contains the following two columns:
1. entity:samle_id : should contain the prefix that you would like for you output files
2. multifasta: google bucket path to the multi-sequence fasta file

Below is a summary of the workflow input variables along with the syntax used for the attribute column when setting up the workflow to run on Terra.bio. For the attributes, the "this." syntax refers Terra to pull the variable from the terra datatable (#1 above).

workflow variable	attribute (input syntax into workflow)
`multifasta`	this.multifasta
`sample_id`	this.sample{terra_datatable_name}_id
`out_dir`	"gs://{path to user defined google bucket}"

Outputs

This workflow generates several output files which are transfered to the user defined user google bucket as defined by the out_dir variable. The table below details each output.

output variable name	file_name	description	google bucket path
`nextclade_version`	N/A	nextclade version	N/A
`nextclade_json`	`{sample_id}_nextclade_json`	json file generated from nextclade	`gs://{user_defined_gcp_bucket}/nextclade_out`
`auspice_json`	`{sample_id}_nextclade.auspice.json`	auspice json file generated from nextclade	`gs://{user_defined_gcp_bucket}/nextclade_out`
`nextclade_csv`	`{sample_id}_nextclade.csv`	csv file generated from nextclade	`gs://{user_defined_gcp_bucket}/nextclade_out`
`pangolin_version`	N/A	pangolin version	N/A
`pangolin_lineage`	`{sample_id}_panoglin_lineage_report.csv`	lineage report generated from pangolin	`gs://{user_defined_gcp_bucket}/pangolin_out`