What’s New (History)
Version 1.5.0
Released on September 14 2022
The two big changes are:
- the ability to use Yaml files to specify samples,
- the introduction of
run_for_all
(and run_for_all_samples
) functions to simplify the usage of the parallel
module (see standard library docs).
Several of the other changes were then to support these two features.
Additionally, some minor fixes and improvements were made.
User-visible Improvements
Add load_sample_list
function to load samples in YAML format (see YAML Samples).
Add compress_level
argument to write
function to specify the compression level.
Added name()
method to ReadSet
objects, so you can do:
input = load_fastq_directory("my-sample")
print(input.name())
which will print my-sample
.
- Added println
function which works like print
but prints a newline after the output.
- Make print()
accept ints and doubles as well as strings.
- Added run_for_all
function to parallel
module, simplifying its API.
- When using the parallel
module and a job fails, writes the log to the corresponding .failed
file.
- External modules can now use the sequenceset
type to represent a FASTA file.
- The load_fastq_directory
function now supports .xz
compressed files.
- The parallel
module now checks for stale locks before re-trying failed tasks. The former model could lead to a situation where a particular sample failed deterministically and then blocked progress even when some locks were stale.
Bugfixes
- The
parallel
module should generate a .failed
file for each failed job, but this was not happening in every case.
- Fixed parsing of GFF files to support negative values (reported by Josh Sekela on the mailing-list).
Version 1.4.2
Released 21 July 2022
Bugfixes
- Fix bug with parsing GFF files (it was assumed that _scores_ were always positive)
Version 1.4.1
Released 3 June 2022
Bugfixes
- Fix bug with low memory mode
Version 1.4.0
Released 30 May 2022
User-visible Improvements
write()
now returns the filename used
write()
can use multiple threads
- Better error messages in multiple situations
- Add a module for GMGC — Global Microbial Gene Catalogue
- Old
motus
(version 1) module deprecated
Bugfixes
- Update –install-reference-data mode to newer URLs, see #107
- Update –create-reference-pack mode to newer format (where indices are
versioned), see #108
- Do not fail when merging empty files (#113)
Internal improvements
- Better building infrastructure
- Switched to the tasty testing framework
assemble()
is now using a more up to date version of megahit, which means
that the older versions cannot be run.
Version 1.3.0
Released 28 January 2021
User-visible improvements
- Adds conversion from string to numbers (int or double) and back
- Better error message if the user attempts to use the non-existent
<\>
operator (suggest </>
)
- Validate
count()
headers on --validate-only
Internal improvements
- Switched internal interval structure to interval-int. For users using
GFF-style annotation in
count()
, this should result in a significant
improvement (less memory, faster performance)
- Use zstd compression for more temporary files
Bugfixes
- Fix cases where sample names contain
/
and collect()
(issue 141)
Version 1.2.0
Released 12 July 2020.
User-visible improvements
- Added function load_fastq_directory
to the builtin namespace. This was previously available under the
mocat
module, but it had become much more flexible than the original MOCAT version,
so it was no longer a descriptive name.
- Better messages in parallel
module when there are no free locks.
Internal improvements
- Modules can now specify their annotation as a URL that NGLess downloads on a
“as needed” basis: in version 1.1, only FASTA files were supported.
- Memory consumption of count() function has been
improved when using GFF files (ca. ⅓ less memory used).
- This one is hopefully **not* user-visible*: Previously, NGLess would ship
the Javascript libraries it uses for the HTML viewer and copy them into all
its outputs. Starting in v1.2.0, the HTML viewer links to the live versions
online.
Version 1.1.1
This is a bugfix release and results should not change. In particular, a
sequence reinjection bug was fixed.
Version 1.1.0
User-visible improvements
- Added discard_singles() function.
- Added
include_fragments
option to orf_find().
- The countfile now
reorders its input if it is not ordered. This is necessary for correct usage.
- More flexible loading of
functional_map
arguments in count to accept multiple comment
lines at the top of the file as produced by eggnog-mapper.
- Added
sense
argument to the count function, generalizing the
previous strand
argument (which is deprecated). Whereas before it was
only possible to consider features either to be present on both strands or
only on the strand to which they are annotated, now it is also possible to
consider them present only on the opposite strand (which is necessary for
some strand-specific protocols as they produce the opposite strand).
- Added
interleaved
argument to fastq
load_mocat_sample
now checks for mismatched paired samples (#120) - Better messages
when collect call could not finish (following discussion on the mailing list)
- Modules can now specify their resources as a URL that NGLess downloads on a
“as needed” basis.
- len now works on lists
Internal improvements
- ZSTD compression is available for output and intermediate files use it for
reduced temporary space usage (and possibly faster processing).
- Faster check for column headers in
functional_map
argument to count() function: now it is
performed as soon as possible (including at the top of the script if the
arguments are literal strings), thus NGLess can fail faster.
- ZSTD compression is available for output and intermediate files use it for
reduced temporary space usage (and possibly faster processing).
- Faster check for column headers in
functional_map
argument to count() function: now it is
performed as soon as possible (including at the top of the script if the
arguments are literal strings), thus NGLess can fail faster.
Version 1.0.1
This is a bugfix release and results should not change.
Bugfixes
- Fix bug with external modules and multiple fastQ inputs.
- Fix bug with resaving input files where the original file was sometimes
moved (thus removing it).
- When
bwa
or samtools
calls fail, show the user the stdout/stderr from
these processes (see #121).
Version 1.0
User-visible improvements
- The handling of multiple annotations in count (i.e., when the user
requests multiple
features
and/or subfeatures
) has changed. The
previous model caused a few issues (#63, but also mixing with
collect(). Unfortunately,
this means that scripts asking for the old behaviour in their version
declaration are no longer supported if they use multiple features.
Version 0.11
Released March 15 2019 (0.11.0) and March 21 2019 (0.11.1).
Version 0.11.0 used ZStdandard compression, which was not reliable (the
official haskell zstd wrapper has issues). Thus, it was removed in v0.11.1.
Using v0.11.0 is not recommended.
User-visible improvements
- Module samtools (version 0.1) now includes samtools_view
- Add –verbose flag to check-install mode (ngless –check-install –verbose)
- Add early checks for input files in more situations (#33)
- Support compression in collect() output (#42)
- Add smoothtrim() function
Bugfixes
- Fix bug with orf_find & prots_out argument
- Fix bug in garbage collection where intermediate files were often left on disk for far longer than necessary.
- Fix CIGAR (#92) for select() blocks
Internal improvements
- Switched to diagrams package for plotting. This should make building easier as cairo was often a complicated dependency.
- Update to LTS-13 (GHC 8.6)
- Update minimap2 version to 2.14
- Call bwa/minimap2 with interleaved fastq files. This avoids calling it twice (which would mean that the indices were read twice).
- Avoid leaving open file descriptors after FastQ encoding detection
- Tar extraction uses much less memory now (#77)
Version 0.10.0
Released Nov 12 2018
Bugfixes
- Fixed bug where header was printed even when STDOUT was used
- Fix to lock1’s return value when used with paths (#68 - reopen)
- Fixed bug where writing interleaved FastQ to STDOUT did not work as expected
- Fix saving fastq sets with –subsample (issue #85)
- Fix (hypothetical) case where the two mate files have different FastQ encodings
User-visible improvements
- samtools_sort() now accepts by={name} to sort by read name
- Add __extra_megahit_args to assemble() (issue #86)
- arg1 in external modules is no longer always treated as a path
- Added expand_searchdir to external modules API (issue #56)
- Support _F/_R suffixes for forward/reverse in load_mocat_sample
- Better error messages when version is mis-specified
- Support NO_COLOR standard: when
NO_COLOR
is
present in the environment, print no colours.
- Always check output file writability (issue #91)
paired()
now accepts encoding
argument (it was documented to, but mis-implemented)
Internal improvements
- NGLess now pre-emptively garbage collects files when they are no longer
needed (issue #79)
Version 0.9.1
Released July 17th 2018
Version 0.9
Released July 12th 2018
User-visible improvements
- Added
allbest()
method to MappedRead.
- NGLess will issue a warning before overwriting an existing file.
- Output directory contains PNG files with basic QC stats
- Added modules for gut gene catalogs of mouse, pig, and dog
- Updated the integrated gene catalog
Internal improvements
- All lock files now are continuously “touched” (i.e., their modification time
is updated every 10 minutes). This makes it easier to discover stale lock
files.
- The automated downloading of builtin references now uses versioned URLs, so
that, in the future, we can change them without breaking backwards
compatibility.
Version 0.8.1
Released June 5th 2018
This is a minor release and upgrading is recommended.
Bugfixes
- Fix for systems with non-working locale installations
- Much faster collect calls
- Fixed lock1 when
used with full paths (see issue #68)
- Fix expansion of searchpath with external modules (see issue #56)
Version 0.8
Released May 6th 2018
Incompatible changes
- Added an extra field to the FastQ statistics, with the fraction of basepairs
that are not ATCG. This means that uses of qcstats must use an up-to-date version declaration.
- In certain cases (see below), the output of count when using a GFF will change.
User-visible improvements
Better handling of multiple features in a GFF. For example, using a GFF
containing “gene_name=nameA,nameB” would result in:
nameA,nameB 1
Now the same results in::
nameA 1
nameB 1
This follows after https://git.io/vpagq and the
case of Parent=AF2312,AB2812,abc-3
Support for minimap2 as alternative
mapper. Import the minimap2
module and specify the mapper
when
calling map. For example:
ngless '0.8'
import "minimap2" version "1.0"
input = paired('sample.1.fq', 'sample.2.fq', singles='sample.singles.fq')
mapped = map(input, fafile='ref.fna', mapper='minimap2')
write(mapped, ofile='output.sam')
Added the </>
operator. This can be used to concatenate filepaths. p0
</> p1
is short for p0 + "/" + p1
(except that it avoids double forward
slashes).
Fixed a bug in select where in some edge cases,
the sequence would be incorrectly omitted from the result. Given that this is
a rare case, if a version prior to 0.8 is specified in the version header,
the old behaviour is emulated.
Added bzip2 support to write.
Added reference argument to count.
Bug fixes
- Fix writing multiple compressed Fastq outputs.
- Fix corner case in select. Previously, it was
possible that some sequences were wrongly removed from the output.
Internal improvements
- Faster collect()
- Faster FastQ processing
- Updated to bwa 0.7.17
- External modules now call their init functions with a lock
- Updated library collection to LTS-11.7
Version 0.7.1
Released Mar 17 2018
Improves memory usage in count()
and the use the when-true
flag in
external modules.
Version 0.7
Released Mar 7 2018
New functionality in NGLess language
- Added max_trim argument to
filter
method of
MappedReadSet
.
- Support saving compressed SAM files
- Support for saving interleaved FastQ files
- Compute number Basepairs in FastQ stats
- Add
headers
argument to samfile function
Bug fixes
- Fix
count
’s mode {intersection_strict}
to no longer behave as {union}
- Fix
as_reads()
for single-end reads
- Fix
select()
corner case
In addition, this release also improves both speed and memory usage.
Version 0.6
Released Nov 29 2017
Behavioural changes
- Changed
include_m1
default in count() function
to True
New functionality in NGLess language
- Added orf_find function (implemented through
Prodigal) for open reading frame (ORF) predition
- Add qcstats() function to retrieve the computed
QC stats.
- Added reference alias for a more human readable name
- Updated builtin referenced to include latest releases of assemblies
Bug fixes
- Output preprocessed FQ statistics (had been erroneously removed)
- Fix –strict-threads command-line option spelling
- Version embedded megahit binary
- Fixed inconsistency between reference identifiers and underlying files
Version 0.5.1
Released Nov 2 2017
Fixed some build issues
Version 0.5
Released Nov 1 2017
First release supporting all basic functionality.