Ocean Metagenomics Assembly and Gene Prediction

In this tutorial, we will analyse a small dataset of oceanic microbial metagenomes.

Note

This tutorial uses the full Ocean Microbial Reference Gene Catalog presented in Structure and function of the global ocean microbiome Sunagawa, Coelho, Chaffron, et al., Science, 2015

  1. Download the toy dataset

First download all the tutorial data:

ngless --download-demo ocean-short

We are reusing the same dataset as in the Ocean profiling tutorial. It may be a good idea to read steps 1-4 of that tutorial before starting this one.

  1. Preliminary imports

To run ngless, we need write a script. We start with a few imports:

ngless "0.6"
  1. Preprocessing

This is as in the profiling tutorial, except that we will be working with a single sample. You could also use the parallel module to make it easier to work on all samples:

sample = 'SAMEA2621155.sampled'
input = load_mocat_sample(sample)

preprocess(input, keep_singles=False) using |read|:
    read = substrim(read, min_quality=25)
    if len(read) < 45:
        discard
  1. Assembly and gene prediction

This is now very simply two calls to the function assemble and orf_find:

contigs = assemble(input)
write(contigs, ofile='contigs.fna')

orfs = orf_find(contigs)
write(contigs, ofile='orfs.fna')

Full script

ngless "0.6"


sample = 'SAMEA2621155.sampled'
input = load_mocat_sample(sample)

preprocess(input, keep_singles=False) using |read|:
    read = substrim(read, min_quality=25)
    if len(read) < 45:
        discard

contigs = assemble(input)
write(contigs, ofile='contigs.fna')

orfs = orf_find(contigs)
write(contigs, ofile='orfs.fna')

Privacy: Usage of this site follows EMBL’s Privacy Policy. In accordance with that policy, we use Matomo to collect anonymised data on visits to, downloads from, and searches of this site. Contact: bork@embl.de.