Modules

The key words “MUST”, “MUST NOT”, “SHOULD”, etc. are to be interpreted as described in RFC 2119.

Reuse of existing test data

Pre-existing files from nf-core/test-datasets MUST be reused if at all possible to keep the size of the test data repository minimal.

If appropriate test data does not exist in the modules branch of nf-core/test-datasets, contact the nf-core community on the nf-core Slack #modules channel to discuss possible options.

Test data alternatives for large datasets

Adding test data may not be possible for some modules if the input data is too large or requires a local database.

In these scenarios, use the Nextflow stub feature to test the module.

Refer to the gtdbtk/classify module and its corresponding test script for an example of how to use this feature for module development.

Module test data organisation

Files SHOULD be organised based on the existing structure.

For bioinformatics pipelines, files are typically organised by discipline, organism, platform, or format.

Relatedness of module test data

Downstream or related test data files SHOULD be named based on the upstream file name.

For example, if genome.fasta is used as the upstream file, the output file should be named genome.<new_extension>.

Module test data documentation

Test data files MUST have an entry in the nf-core/test-datasets repository README.