NGLess is a domain-specific language for NGS (next-generation sequencing data) processing.
For questions, you can also use the ngless mailing list.
Note
If you are using NGLess for generating results in a scientific publication, please cite
NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language by Luis Pedro Coelho, Renato Alves, Paulo Monteiro, Jaime Huerta-Cepas, Ana Teresa Freitas, Peer Bork - Microbiome 2019 7:84; https://doi.org/10.1186/s40168-019-0684-8
For metagenomics profiling, consider using ng-meta-profiler, which is a collection of predefined pipelines developed using NGLess.
NGLess is best illustrated by an example:
ngless "1.4"
input = paired('ctrl1.fq', 'ctrl2.fq', singles='ctrl-singles.fq')
input = preprocess(input) using |read|:
read = read[5:]
read = substrim(read, min_quality=26)
if len(read) < 31:
discard
mapped = map(input, reference='hg19')
write(count(mapped, features=['gene']),
ofile='gene_counts.csv',
format={csv})
Ngless has builtin support for model organisms:
and the standard library includes support for mOTUs, metagenomics profiling of marine samples and human gut microbiome samples. We also have standard library modules for helping users upgrading from MOCAT or running many samples (we have used NGLess on projects with >10,000 samples).
NGLess puts a strong emphasis on reproducibility.
ngless
can be used as a traditional command line transformer
utility, using the -e
argument to pass an inline script on the
command line.
The -p
(or --print-last
) argument tells ngless to output the
value of the last expression to stdout
.
Extract file reads from a SAM (or BAM) file:
$ ngless -pe 'as_reads(samfile("file.sam"))' > file.fq
This is equivalent to the full script:
ngless "1.4" # <- version declaration, optional on the command line
samcontents = samfile("file.sam") # <- load a SAM/BAM file
reads = as_reads(samcontents) # <- just get the reads (w quality scores)
write(reads, ofname=STDOUT) # <- write them to STDOUT (default format: FASTQ)
This only works if the data in the samfile is single ended as we pipe out a single FQ file. Otherwise, you can always do:
ngless "1.4"
write(as_read(samfile("file.sam")),
ofile="output.fq")
which will write 3 files: output.1.fq
, output.2.fq
, and
output.singles.fq
(the first two for the paired-end reads and the
last one for reads without a mate).
Building on the previous example. We can add a select()
call to only
output unmapped reads:
$ ngless -pe 'as_reads(select(samfile("file.sam"), keep_if=[{mapped}]))' > file.fq
This is equivalent to the full script:
ngless "1.4" # <- version declaration, optional on the command line
samcontents = samfile("file.sam") # <- load a SAM/BAM file
samcontents = select(samcontents, keep_if=[{mapped}]) # <- select only *mapped* reads
reads = as_reads(samcontents) # <- just get the reads (w quality scores)
write(reads, ofname=STDOUT) # <- write them to STDOUT (default format: FASTQ)
For a true Unix-like utility, the input should be read from standard
input. This can be achieved with the special file STDIN
. So the
previous example now reads
$ cat file.sam | ngless -pe 'as_reads(select(samfile(STDIN), keep_if=[{mapped}]))' > file.fq
Obviously, this example would more interesting if the input were to come
from another programme (not just cat
).
Privacy: Usage of this site follows EMBL’s Privacy Policy. In accordance with that policy, we use Matomo to collect anonymised data on visits to, downloads from, and searches of this site. Contact: bork@embl.de.