To add a module to ngless there are two options: external or internal modules. External modules are the simplest option.
External modules can perform two tasks:
Adding references makes them available to the map()
call using the
reference
argument and (optionally) allows for calls to count()
without
specifying any annotation file.
Like everything else in ngless, these are versioned for reproducibility so that the resulting script implicitly encodes the exact version of the databases used.
Functions in external modules map to command line calls to a script you provide.
You can use the example module in the ngless source for inspiration. That is a complete, functional module.
A module is defined by an YaML
file.
Every module has a name and a version:
name: 'module'
version: '0.0.0'
Everything else is optional.
References are added with a references section, which is a list of
references. A reference contains a fasta-file
and (optionally) a
gtf-file
. For example:
references:
-
name: 'ref'
fasta-file: 'data/reference.fna'
gtf-file: 'data/reference.gtf.gz'
Note that the paths are relative to the module directory. The GTF file may be gzipped.
An init
section defines an initialization command. This will be run
before anything else in any script which imports this module. The intention
is that the module can check for any dependencies and provide the user with an
early error message instead of failing later after. For example:
init:
init_cmd: './init.sh'
init_args:
- "Hello"
- "World"
will cause ngless to run the command ./init.sh Hello World
whenever a user
imports the module.
A note about paths: paths you define in the module.yaml file are relative
to the Yaml file itself. Thus you put all the necessary scripts and data in
the module directory. However, the scripts are run with the current working
directory of wherever the user is running the ngless protocol (so that any
relative paths that the user specifies work as expected). To find your data
files inside your module, ngless sets the environmental variable
NGLESS_MODULE_DIR
as the path to the module directory.
To add new functions, use a functions
section, which should contain a list of
functions encoded in YaML format. Each function has a few required arguments:
nglName
the name by which the function will be called inside of an ngless
script.
arg0
the script to call for this function. Note that the user will never see
this.
For example:
functions:
-
nglName: "test"
arg0: "./run-test.sh"
will enable the user to call a function test()
which will translate into a
call to the run-test.sh
script (see the note above about paths).
You can also add arguments to your function, naturally. Remember that ngless
functions can have only one unnamed argument and any number of named arguments.
To specify the unnamed argument add a arg1
section, with the key atype
(argument type):
arg1:
atype: <one of 'readset'/'mappedreadset'/'counts'/'str'/'flag'/'int'/'option'>
The arguments of type readset, mappedreadset, and counts are passed as paths to a file on disk. Your command is assumed to not change these, but make a copy if necessary. Bad things will happen if you change the files. You can specify more details on which kind of file you expect with the following optional arguments:
filetype: <one of "tsv"/"fq1"/"fq2"/"fq3"/"sam"/"bam"/"sam_or_bam"/"tsv">
can_gzip: true/false
can_bzip2: true/false
can_stream: true/false
The flags can_gzip
/can_bzip2
indicate whether your script can accept
compressed files (default: False). can_stream
indicates whether the input
can be a pipe (default: False, which means that an intermediate file will
always be used).
For example, if your tool wants a SAM file (and never a BAM file), you can write:
arg1:
atype: mappedreadset
filetype: sam
ngless
will ensure that your tool does receive a SAM file (including
converting BAM to SAM if necessary).
Finally, additional argument are specified by a list called additional
.
Entries in this list have exactly the same format as the arg1
entry, except
that they have a few extra fields. The extra field name
is mandatory, while
everything else is optional:
additional:
-
name: <name>
atype: <as for arg1: 'readset'/'mappedreadset'/...>
def: <default value>
required: true/false
Arguments of type flag
have an optional extra argument, when-true
which
is a list of strings which will be passed as extra arguments when the flag is
true. You can also just specify a single string. If when-true
is missing,
ngless will pass an option of the form --name
(i.e., a double-dash then the
name used). For example:
additional:
-
name: verbose
atype: flag
def: false
when-true: "-v"
-
name: complete
atype: flag
def: false
when-true:
- "--output=complete"
- "--no-filter"
All other argument types are passed to your script using the syntax
--name=value
if they are present or if a default has been provided.
Arguments of type option
map to symbols in ngless and require you to add an
additional field allowed
specifying the universe of allowed symbols. Ngless
will check that the user specifies arguments from the allowable universe. For
example:
additional:
-
atype: 'option'
name: 'verbosity'
def: 'quiet'
allowed:
- 'quiet'
- 'normal'
- 'loud'
If you do not have a fixed universe for your argument, then it should be a
str
argument.
The required
flag determines whether the argument is required. Note that
arguments with a default argument are automatically optional (ngless may
trigger a warning if you mark an argument with a default as required).
To return a value, you must request that ngless generate a new temporary file
for the script to generate output to. Therefore, you need to specify a
return
section, with three parameters: rtype
(return type, see below),
name
the name of the argument to use, and extension
the file extension
of the output type.
return:
rtype: "counts"
name: "ofile"
extension: "sam"
rtype
must be one of "void"
, "counts"
or "mappedreadset"
.
Returning readset
isn’t currently supported.
If you plan to make use of search path expansion, in order
for NGLess to expand the argument prior to passing it to the external module
you need to set atype: "str"
and expand_searchpath: true
.
additional:
-
atype: 'str'
name: 'reference'
expand_searchpath: true
Finally, if you wish to, you can add one or more citations:
citation: "A paper which you want to be listed when users import your module"
This will be printed out whenever users use your module and thus will help you get exposure.
If you have more than one citation, you can use the citations
key and
provide a list:
citations:
- "Paper 1"
- "Paper 2"
External modules can specify a minimal NGLess version that they need to run.
This is optional, but if it is used, you need to additionally supply a reason
for the requirement (using the aptly-named reason
field):
min-ngless-version:
min-version: "1.3"
reason: "The min-ngless-version field is only supported since NGLess 1.3"
This is very advanced as it requires writing Haskell code which can then interact very deeply with the rest of ngless.
For an example, you can look at the example internal module. If you want to get started, you can ask about details on the ngless user mailing list.
Privacy: Usage of this site follows EMBL’s Privacy Policy. In accordance with that policy, we use Matomo to collect anonymised data on visits to, downloads from, and searches of this site. Contact: bork@embl.de.