Assembly
Assembly
Use the most popular assembler SPAdes
to assemble a bacterial genome.
Installation
conda install spades
Assembly
When using SPAdes it is typical to choose at least 3 k-mer sizes. One low, one medium and one high. We will use 33, 55 and 91.
First merge the unpaired reads
cd seqtk
cat SRR10561173_2U_seqtk.fastq.gz >> SRR10561173_1U_seqtk.fastq.gz
rm SRR10561173_2U_seqtk.fastq.gz
mv SRR10561173_1U_seqtk.fastq.gz SRR10561173_U_seqtk.fastq.gz
cd ..
then run spades
spades.py -1 ~/seqtk/SRR10561173_1P_seqtk.fastq.gz -2 ~/seqtk/SRR10561173_2P_seqtk.fastq.gz -s ~/seqtk/SRR10561173_U_seqtk.fastq.gz -k 33,55,91 -o ~/assembly
Check the result.
File | content |
---|---|
contigs.fasta | assembled contigs |
scaffolds.fasta | assembled scaffolds |
assembly_graph.fastg | the graph of contigs |
assembly_graph_with_scaffolds.gfa | the graph of scaffolds |
spades.log | the log file of the spades running |
Quanlity assessment of the assembly
-
Install
quast
andbandage
quast
have some compatibility problem with the bioinfo environment, the solution is to install it in a separate environment.conda create -n quast conda activate quast conda install quast
-
Run
quast
quast
use a reference to assess the quality of the assembly. Upload the reference genome and annotation filesalmonella_typhimurium_lt2.fasta
andsalmonella_typhimurium_lt2.gff
to the server.quast ~/assembly/SRR10561173.fasta -R ~/seq/salmonella_typhimurium_lt2.fasta -G ~/seq/salmonella_typhimurium_lt2.gff -o ~/quast/
Check the result.
-
Run
bandage
Download and install
bandage
from bandage website, or copy from my folder.Download the
FASTG
graph file ofSPAdes
to your laptop, open it withbandage
.