In this tutorial, we will analyse a small dataset of oceanic microbial metagenomes.
Note
This tutorial uses the full Ocean Microbial Reference Gene Catalog presented in Structure and function of the global ocean microbiome Sunagawa, Coelho, Chaffron, et al., Science, 2015
First download all the tutorial data:
ngless --download-demo ocean-short
We are reusing the same dataset as in the Ocean profiling tutorial. It may be a good idea to read steps 1-4 of that tutorial before starting this one.
To run ngless, we need write a script. We start with a few imports:
ngless "1.4"
First, we want to trim the reads based on quality:
sample = 'SAMEA2621155.sampled'
input = load_mocat_sample(sample)
input = preprocess(input, keep_singles=False) using |read|:
read = substrim(read, min_quality=25)
if len(read) < 45:
discard
This is now very simply two calls to the function assemble and orf_find:
contigs = assemble(input)
write(contigs, ofile='contigs.fna')
orfs = orf_find(contigs)
write(contigs, ofile='orfs.fna')
ngless "1.4"
sample = 'SAMEA2621155.sampled'
input = load_mocat_sample(sample)
input = preprocess(input, keep_singles=False) using |read|:
read = substrim(read, min_quality=25)
if len(read) < 45:
discard
contigs = assemble(input)
write(contigs, ofile='contigs.fna')
orfs = orf_find(contigs)
write(contigs, ofile='orfs.fna')
Privacy: Usage of this site follows EMBL’s Privacy Policy. In accordance with that policy, we use Matomo to collect anonymised data on visits to, downloads from, and searches of this site. Contact: bork@embl.de.