Database practice
Database practice
Explore a gene
SopD is an effector protein of Salmonella Typhimurium functions to alter host cell physiology and promote bacterial survival in host tissues. It contributes to the replication of bacterial in macrophages. We will explore the detail of sopD gene in multiple databases in order to understand what do we expect from these databases in Bioinformatic study.
NCBI GenBank
-
Go to http://www.ncbi.nlm.nih.gov/ , enter
salmonella typhimurium[orgn] AND sopd
and clickSearch
. By default you search ALL the databases and the results also include many that refer to this topic but are not necessarily specifically about sopD. How many nucleotide sequences are returned? ___Click at ‘Nucleotide’ to see the Result Page. Examine the accession number for each entry in the new page.
-
Click on the sequence “Acession: AF234265.1”.
- Read the GenBank file. When did the sequence uploaded? __ How many publications are there about this sequence? __
- Click at
Graphics
in the top left. Explore the graph, Try zoom in, zoom out, and move left or right. What’s the start and end position of the gene on the Salmonella Typhimutium genome? __ to __ - Click at
Fasta
in the top left, and then click atsend to
in the top right, choosefile
, save the file to a local folder assopD_gene.fasta
UniProt
- Go to https://www.uniprot.org/, enter
salmonella typhimurium sopd
and clickSearch
. How many manually reviewed results, and how many unreviewed results are returned? ___ - Click on the Entry
P40722
. Read the results. Try find the following information:- What’s the subcellular location of the protein?
- What’s the resolution of the protein structure?
- Download the fasta file and save as
sopD_protein.fasta
. - Find the KEGG link from the
Genome annotation databases
. Go to KEGG
KEGG
- Read the information on the KEGG page.
- Click the link after
Pathway
. What’s the downstream protein ofsopD
in the “Bacterial invasion of epithelial cells” pathway?
Download a reference genome
- Go to http://www.ncbi.nlm.nih.gov/ and enter
salmonella typhimurium
and clickSearch
. How many results are returned fromassembly
database? - Click on
assembly
database, choose the entryASM694v2
.- Read the page, click the link after
Organism name
. Which journal and when did the sequence first published? - Back to the
ASM694v2
page, click theDownload Assembly
button at the top right. Download thegenomic fasta
,genomic GFF
andgenomic genbank file
fromrefseq
database. Save assalmonella_typhimurium_lt2.fasta
,salmonella_typhimurium_lt2.gff
andsalmonella_typhimurium_lt2.gb
- Read the page, click the link after
- Back to the NCBI search result page, this time go to
SRA
database.- Choose the first entry. Now you see the record of a sequencing data of a Salmonella Typhimurium strain. What’s the sequencing machine used? What’s the size of the raw data?
- Click the link after
Sample
. This is the metadata of the Salmonella Typhimurium strain bacterial. Where, when and from which source did this bacterial sampled? - Back to the
SRA
page. Click the link underrun
, now you will see the metadata of the sequence. - Open terminal, login to the server, download the sequence use
sra-tools
with theSRR
run id:
source activate bioinfo fastq-dump --split-3 --gzip SRR10561173
It may take a long time. Be patient, we don’t need this file today.
More about
sra-tools
: sra-tools documentation
Visualise the reference genome
- Download the
Artemis
installation package and install. - Start
Artemis
, clickFile
-Open
, opensalmonella_typhimurium_lt2.fasta
, load the genome sequence - Click
File
-Read an entry
, openSalmonella_typhimurium_lt2.gff
, load the annotation file. - Getting around in
Artemis
- Try the
Goto
menu. ClickGoto
-Navigator
- Fill a number in
goto base
form. Then clickGoto
- Fill
sopD
inGoto Feature with gene name
, then clickGoto
- Fill a number in
- Try the
Select
menu. ClickSelect
-Feature selector
, Select and view alltRNAs
- Take your time play with
Artemis
!