Nextflow

Author

Nima Rafati

Published

July 15, 2022

DSL2

Process

input

Inputs are qualifier and name. qualifier can be treated as variable type.
- val: Access the input value by name in the process script.
- path: Handle the input value as a path, staging the file properly in the execution context.
- env: Use the input value to set an environment variable in the process script.
- stdin: Forward the input value to the process stdin special file.
- tuple: Handle a group of input values having any of the above qualifiers.
- each: Execute the process for each element in the input collection.

`val`

Code

process basicExample {
  input:
  val x

  "echo process job $x"
}

workflow {
  def num = Channel.of(1,2,3)
  basicExample(num)
}

Code

process basicExample {
  input:
  val x

  "echo process job $x"
}
workflow {
  Channel.of(1,2,3).into | basicExample
}

file is deprecated

It is similar to path; check path instead.

`path`

Path can be a (groovy) list of files or directories. For example, all the fasta files (*.fa) in a directory.

Code

process blastThemAll {
  input:
  path query_file # here you are in a way renaming the variable passed to the process similar to function. 

  "blastp -query ${query_file} -db nr"
}

workflow {
  def proteins = Channel.fromPath( '/some/path/*.fa' )
  blastThemAll(proteins)
}

You may also pass a specific file name. All the fastq files in a directory, are passed to blastp as my_query.fa.

Code

process blastThemAll {
  input query_file, name: 'my_query.fa' #here input file is named as my_query.fa
  # or
  input 'my_query.fa'
  
  "blastp -query my_query.fa -db nr" #here input file is named as my_query.fa
}

workflow {
  def proteins = Channel.fromPath( '/some/path/*.fa' )
  blastThemAll(proteins)
}

There might be situations that you want to pass certain number of files to a process and if the number of files is not equal to the number of files you want to pass, the process will not be executed.
To address this you can use arity parameter.

Code

input:
    path('one.txt', arity: '1')         // exactly one file is expected
    path('two_*.txt', arity: '2')      // exactly two files are expected
    path('several_*.txt', arity: '1..*')   // one or more files are expected

You can also control the name of the file in task directory by stageAs parameter.

Code

input:
    path('one.txt', stageAs: 'my_file.txt')
    # or 
    path x, stageAs: 'my_file.txt'

    "my_command --in my_file.txt"

You can also pass multiple files and use a variable ($x in this case) to refer to the file name.

Code

input:
  path x, stageAs: 'my_dir/*.txt'

  "my_command --in $x"

`env`

env can be used to define a variable in the process script. Opposite to val which is used to pass a variable across the processes.

Code

process envExample {
  input:
  env my_var

  "echo process job $my_var"
}

`stdin`

Code

process stdinExample {
  input:
  stdin my_input

  "echo $my_input"
}

Code

process stdinExample {
  input:
  stdin str

  "cat -"
}

workflow {
  Channel.of('hello', 'hola', 'bonjour', 'ciao')
    | map { it + '\n' } #it represent the element in the channel
    | printAll
}

`tuple`

tuple is used to pass multiple variables to a process.

Code

process tupleExample {
    input:
    tuple val(x), path('input.txt')

    """
    echo "Processing $x"
    cat input.txt > copy
    """
}

workflow {
  Channel.of( [1, 'alpha.txt'], [2, 'beta.txt'], [3, 'delta.txt'] ) | tupleExample
}

The output will be : echo “Processing 1” cat alpha.txt > copy echo “Processing 2” cat beta.txt > copy echo “Processing 3” cat delta.txt > copy

`each`

Repeating a process for each element in a collection.

Code

process alignSequences {
  input:
  path in1 # is sequences
  each param1 # is methods
  each path(dirs) # is libraries

  """
  tool -in $in1 -mode $param1 -lib $dirs > result
  """
}

workflow {
  in1 = Channel.fromPath('*.txt')
  param1 = ['R', 'S']
  dirs = [ file('dir1'), file('dir2'), file('dir3') ]

  alignSequences(sequences, methods, libraries)
}

output

By output you can define the output of a process. Similar to input, output can also have qualifiers.

Code

process myProcess {
  input:
  path input_file

  output:
  val x
  val 'log'
  val "${input_file.baseName}.out"

  script:
  x = input_file.name
  """
  echo $x > log
  cat $input_file > ${input_file.baseName}.out
  """
}

workflow {
  ch_dummy = Channel.fromPath('*.csv').first()
  (var, log, basename) = myProcess(ch_dummy)

  var.view { "ch_var: $it" }
  log.view { "ch_str: $it" }
  basename.view { "ch_exp: $it" }
}

# or
workflow {
  ch_dummy = Channel.fromPath('*.csv').first()

  results = ch_dummy | myProcess

  # Unpacking the results
  results.view { result ->
    def (var, log, basename) = result
    println "ch_var: $var"
    println "ch_str: $log"
    println "ch_exp: $basename"
  }
}

`path`

Code

process myProcess {
  input:
  path input_file

  output:
  path "${input_file.baseName}.out"

  script:
  """
  echo "Processing ${input_file}" > ${input_file.baseName}.out
  """
}

workflow{
  input_file = channel.fromPath('*.txt')
  results = input_files | myProcess
  # or
  results = input_files.map { it -> myProcess(it) }
  # or
  # results = input_files.map { myProcess(it) }
  # or
  # results = input_files.map(myProcess)

  results.view { results ->
  def (output_file) = result
  println "output file: $output_file"
  }
  # or
  # results.view {results ->
  # println "output file: £{results}"
  # }
}

`arity`

Similar to input, you can use arity to control the number of files to be generated.

Code

output:
    path('one.txt', arity: '1')         // exactly one file is expected
    path('two_*.txt', arity: '2')      // exactly two files are expected
    path('several_*.txt', arity: '1..*')   // one or more files are expected

Multiple outputs

Code

process myProcess {
  input:
  path input_file

  output:
  path "${input_file.baseName}.out"
  path "${input_file.baseName}.log"

  script:
  """
  echo "Processing ${input_file}" > ${input_file.baseName}.out
  echo "Log file for ${input_file}" > ${input_file.baseName}.log
  """
}
workflow{
  input_file = channel.fromPath('*.txt')
  results = input_files | myProcess
  results.view { results ->
  def (output_file, log_file) = result
  println "output file: $output_file"
  println "log file: $log_file"
  }
}

Dynamic output file names

To save files dependent on the input file name.

Code

process mapping {
  input:
  val sample
  path seq

  output:
  path "${sample}.sam"

  """
  tool -in $seq > ${sample}.sam
  """
}

`stdout`

To output the result of a process to the stdout.

Code

process myProcess {
  input:
  path input_file

  output:
  stdout 

  script:
  """
  echo "Processing ${input_file}" 
  """
}

`eval`

eval is used to save a standard output of a command.

Code

process myProcess {
  input:
  path input_file

  output:
  eval ('date')

  script:
  """
  echo "Processing ${input_file}" > result.txt
  """
}

`tuple`

BY using tuple in output you can define multiple outputs as a channel. For example, you can link the output to information in metadata such as sample_name.

Code

process myProcess {
  input:
  path input_file

  output:
  tuple val(input_file), path("${input_file.baseName}.out")

  script:
  """
  echo "Processing ${input_file}" > ${input_file.baseName}.out
  """
}

workflow{
  input_file = channel.fromPath('*.txt')
  results = input_files | myProcess
  results.view { results ->
  def (input_file, output_file) = result
  println "input file: $input_file"
  println "output file: $output_file"
  }
}

Additional output

`emit`

By using emit you can access the output of a process in the workflow.

Code

process myProcess {
  output:
  path "test.out", emit: output_file
  # or
  # emit output_file

  script:
  """
  echo "this is a test"> $test.out
  """
}

workflow{
  myProcess()
  myProcess.out.output_file.view()
}

`optional: true | fales`

It can be used to switch on or off the output. In other words to give the option to return the output or not.

Code

process myProcess {
  output:
  path "test.out", optional: true

  script:
  """
  echo "this is a test"> $test.out
  """
}

When

when is used to control the execution of a process dependent on state of given condition.

Code

process myProcess {
  input:
  path input_file

  output:
  path "${input_file.baseName}.out"

  when:
  input_file.name =~ /Sample_*.txt/

  script:
  """
  echo "Processing ${input_file}" > ${input_file.baseName}.out
  """
}

Directives

Optional settings for a process.

`accelerator`

To specify a hardware.

Code

process myProcess {
  input:
  path input_file

  output:
  path "${input_file.baseName}.out", emit: output_file
  accelerator 4, type: 'nvidia-h100-mega-80gb'

  script:
  """
  tool --in $input_file --out ${input_file.baseName}.out
  """
}

`beforeScript` or `afterScript`

This directives are used to run custom script before or after the main process.

Code

process myProcess {
  input:
  path input_file

  output:
  path "${input_file.baseName}.out", emit: output_file

  beforeScript:
  """
  source /path/to/my/script.sh
  """

  script:
  """
  echo "Processing ${input_file}" > ${input_file.baseName}.out
  """

  afterScript:
  """
  echo "Process finished"
  """
}

`cache`

To cache the output of a process.

Code

process myProcess {
  cahce true # or false 
#  cache lenient: true # minimal input just to save metadata such as file name and size. 
#  cache deep: true # file's content is saved in the cache.
    
  script:
  """
  command....
  """
}

`executor`

To specify the executor.
- slurm - local - k8s - awsbatch - sge - pbspro - lsf - …..

Code

process myProcess {
  executor 'slurm'
  //...
}

You can also create profile where you specify cluster and project.

Code

profiles {

    uppmax {
        executor {
            name = 'slurm'
        }
        process {
            scratch = '$SNIC_TMP'
            clusterOptions = ' -A snic2022-22-606  --mail-type=all --mail-user=nimarafati@gmail.com '
        }
    }
}

`clusterOptions`

To specify the settings for a cluster.

Code

process myProcess {
  executor 'slurm'
  clusterOptions '-A snic2022-22-606  --mail-type=all' 
  //...
}

`queue`

It allows you to set the queue for a process when submitting a job when using grid based executer.
- short for short jobs.
- long for long jobs.

Code

process myProcess {
  queue 'short'
  executor 'sge'
  //...
}

`resourceLabel`

Using custom-name to pair the workflow to the computing resources (AWS Batch,…). This should be linked according to the structure of the cloud system.

`scratch`

To specify the scratch directory, where tasks output are temporarily stored.

Code

process myProcess {
  scratch '/path/to/scratch'
  //...
}

`shell`

To specify the shell to run the process. You can have it in configuration file.

Code

process {
  shell '/bin/bash', '-euo', 'pipefail'
}
# or
process.shell = ['/bin/bash', '-euo', 'pipefail']

`conda`

To specify a conda environment.

Code

process myProcess {
  conda 'tool=0.1.1'
  //...
}

`container`

To specify a container.

Code

process myProcess {
  container 'ubuntu:20.04'
  //...
}

`cpu` & `memory`

To specify the number of CPUs.

Code

process myProcess {
  cpu 4
  memory '8 GB'
  executor 'slurm'

  script:
  """
  command.... --cpu ${task.cpus}
  """
}

`disk`

To specify the disk space.

Code

process myProcess {
  disk '100 GB'
  executor 'slurm'
  //...
}

`time`

To specify the time limit for a process.

Code

process myProcess {
  time '8 h'
  //...
}

`resourceLimit`

To specify the resource limit for a process.

Code

process myProcess {
  resourceLimit 'cpus: 16, memory: 8GB, time: 8.h, disk: 2GB'
  //...
}

You can also specify the resource limit in the configuration file.

Code

process {
  resourceLimit 'cpus: 16, memory: 8GB, time: 8.h'
}

`debug`

It is explanatory, to debug the process.

Code

process myProcess {
  debug true
  //...
}

`errorStrategy`

Tracing and handling the error in the workflow

Name	Executor
terminate	Terminates the execution as soon as an error condition is reported. Pending jobs are killed (default)
finish	Initiates an orderly pipeline shutdown when an error condition is raised, waiting the completion of any submitted job.
ignore	Ignores processes execution errors.
retry	Re-submit for execution a process returning an error condition.

You can use maxErrors 5, maxRetries = 3 together with retry.

Code

process ignoreAnyError {
  errorStrategy 'retry'
  maxErrors 5

  script:
  <your command string here>
}

`ext`

To specify user custom settings.

Code

process myProcess {
  ext version: '1.0', arg: '--param1 value1 --param2 value2'
  //...
}

Or in the configuration

Code

params.star_version = '2.7.9.a'
params.star_args = '--runThreadN 8 '

`label`

To specify a label for a process.

Code

process myProcess {
  label 'my_label'
  //...
}

You can use the same label for multiple processes. Also, you can use multiple labels for a process.

`maxForks`

Define how many process instances to perform in parallel.

Code

process doNotParallelizeIt {
  maxForks 1

  '''
  <your script here>
  '''
}

`module`

To load a module in local machine/server.

Code

process myProcess {
  module 'BioPerl:blast'
      //...
}

`penv`

To specify parallel environments.

Code

process myProcess {
  cpu 4
  penv 'openmpi'
  executor 'slurm'
  
  //...
}

`puhlishDir`

To specify the directory to publish the output. Other options to set:
- enabled to enable or disable the publishing.
- failOnError to stop the workflow if the publishing fails.
- mode to specify the mode of publishing:
- copy to copy the files to the publish directory.
- link to create a symbolic link to the files in the publish directory.
- move to move the files to the publish directory.
- rellink to create a relative symbolic link to the files in the publish directory.
- symlink to create a symbolic link to the files in the publish directory.
- path to specify the path to the publish directory. publishDir path: '/path/results/'
- pattern to specify the pattern of the files to publish. publishDir pattern: '*.txt'
- saveAs to specify the name of the file in the publish directory. publishDir saveAs: 'my_file.txt'
-

Code

process myProcess {
  publishDir 'results', mode: 'copy', overwrite: true
  //...
}

`tag`

To specify a tag for a every analysis in process which will be saved in the log file. The difference with label is that label is used to label a group of analysis in processes.

Code

process myProcess {
  tag '$sample'  

  input:
  val sample

  script
  """
  echo "Processing $sample"
  """
}
workflow {
  Channel.of('sampleA', 'sampleB', 'sampleC') | myProcess
}

This will give output as:

Launching `myProcess` [sampleA]
Launching `myProcess` [sampleB]
Launching `myProcess` [sampleC]

Dynamic directives

Setting resources dynamically

You can instruct the process to dynamially change the settings dependent on the input. In following example the queue is set to short if the input file is smaller than 1MB.

Code

process myProcess {
  input:
  path input_file

  queue {input_file.size < 1.MB ? 'short', 'long'}

  script:
  """
  echo "Processing ${input_file} on queue ${task.queue}"
  """
}

Or for computing depedenet on the process labels.

Code

process FASTQC {
    tag "$meta.id"
    label 'process_medium'
    //...
}
process TRIMMOMATIC {
    tag "$meta.id"
    label 'process_low'
    //...
}

process {
  withlabel: 'process_medium' {
        cpus   = { check_max( 2     * task.attempt, 'cpus'    ) }
        memory = { check_max( 12.GB * task.attempt, 'memory'  ) }
        time   = { check_max( 4.h   * task.attempt, 'time'    ) }
    }
    withlabel: 'process_low' {
        cpus   = { check_max( 1     * task.attempt, 'cpus'    ) }
        memory = { check_max( 6.GB  * task.attempt, 'memory'  ) }
        time   = { check_max( 2.h   * task.attempt, 'time'    ) }
    }
  }

Setting retry dynamically

You can set the retry dynamically dependent on the availability of ther resources (e.g. network congestion,….).

Code

process {
   errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
    maxRetries    = 1
    maxErrors     = '-1'
}

143 (SIGTERM): Process was terminated (possibly due to a timeout).
137 (SIGKILL): Process was killed (possibly due to exceeding resource limits).
104: Less common, could be application-specific or network-related.
134 (abort): Process aborted due to internal error.
139 (SIGSEGV): Process encountered a segmentation fault.

Channel

There are two channel types: - Queue channel: A channel that emits a sequence of values. This channel can be created by:
- factory methods: of, fromPath
- operators: map, flatMap - processes: see above
- Value channel: A channel that emits a single value. This channel can be created by:
- factory methods: value - operators: first, collect, reduce

Channel factories

`channel.empty`

Creating an empty channel without emitting any value.

`channel.from`

Creating a channel from a list of values.

Code

ch = Channel.from( 'alpha', 'beta', 'delta' )
ch.subscribe { println "Got: $it" }

subscribe a function for the channel printing the emitted values

`channel.fromList`

Creating a channel from a list of values.

Code

ch = Channel.fromList( ['alpha', 'beta', 'delta'] )
ch.view()

# Or

channel.
  fromList( ['alpha', 'beta', 'delta'] )
  .view { println "Got: $it" }

Output:

Got: alpha
Got: beta
Got: delta

`channel.fromPath`

Emitting file(s) from a path. NOTE: does not check whether the file exists.

Code

myFileChannel = Channel.fromPath( '/data/file.txt' )
pairFiles = Channel.fromPath( 'data/file_{1,2}.fq' )
expl3 = Channel.fromPath( '/path/*', hidden: true ) # search for hidden files.   

myFileChannel = Channel.fromPath( '/path/*b', type: 'dir' ) # Listing DIRECTORIES ending with b
myFileChannel = Channel.fromPath( '/path/a*', type: 'any' ) # Listing files and directories starting with a  

Channel.fromPath( ['/some/path/*.txt', '/other/path/*.tx'] ) # You can pass multiple formats to channel

** checkes directories for given files too.

Parametrs:
- maxDepth: Maximum number of directory levels to visit (default: no limit)
- followLinks: When true it follows symbolic links during directories tree traversal, otherwise they are managed as files (default: true)
- relative: When true returned paths are relative to the top-most common directory (default: false)
- checkIfExists: When true throws an exception of the specified path do not exist in the file system (default: false)

`channel.fromFilePairs`

Emitting pair files (read1/read2).

Code

Channel
  .fromFilePairs('/domus/h1/nimar/nextflow_tutorial/fastq/*{1,2}.fq')
  .view()
# [HG00100, [/domus/h1/nimar/nextflow_tutorial/fastq/HG00100_1.fq, /domus/h1/nimar/nextflow_tutorial/fastq/HG00100_2.fq]]
# [HG00097, [/domus/h1/nimar/nextflow_tutorial/fastq/HG00097_1.fq, /domus/h1/nimar/nextflow_tutorial/fastq/HG00097_2.fq]]
# [HG00101, [/domus/h1/nimar/nextflow_tutorial/fastq/HG00101_1.fq, /domus/h1/nimar/nextflow_tutorial/fastq/HG00101_2.fq]]
# 
Channel
  .fromFilePairs('/some/data/*', size: -1) { file -> file.extension }
  .view { ext, files -> "Files with the extension $ext are $files" }
# Files with the extension fq are [/domus/h1/nimar/nextflow_tutorial/fastq/HG00097_1.fq, /domus/h1/nimar/nextflow_tutorial/fastq/HG00097_2.fq, /domus/h1/nimar/nextflow_tutorial/fastq/HG00100_1.fq, /domus/h1/nimar/nextflow_tutorial/fastq/HG00100_2.fq, /domus/h1/nimar/nextflow_tutorial/fastq/HG00101_1.fq, /domus/h1/nimar/nextflow_tutorial/fastq/HG00101_2.fq]

size -1 means any file
The notation -> in groovy is used to define a closure ({}) which is essentially an anonymous function (see below for further information).
{ file -> file.extension }: This is a closure that takes each file matched by the pattern and returns its extension. The file parameter represents each file, and file.extension gets its extension. This closure is used to group the files by their extensions.
.view { ext, files -> … }: This is another closure used to process and print each group of files. It takes two parameters:
- ext: The file extension (the key by which files were grouped).
- files: The list of files that have this extension.
- println “Files with the extension $ext are $files”: This prints out the extension and the corresponding files.

`channel.fromSRA`

Emitting files from SRA.

Code

ids = ['ERR908507', 'ERR908506', 'ERR908505']
Channel
    .fromSRA(ids)
    .view()

Output:

[SRR1448794, /vol1/fastq/SRR144/004/SRR1448794/SRR1448794.fastq.gz]
[SRR1448795, /vol1/fastq/SRR144/005/SRR1448795/SRR1448795.fastq.gz] 
[SRR1448796, /vol1/fastq/SRR144/006/SRR1448796/SRR1448796.fastq.gz]

`channel.of`

Creating a channel from a list of values.

Code

Channel
  .of( 'alpha', 'beta', 'delta' )
  .subscribe { println "Got: $it" }

Output:

Got: alpha
Got: beta
Got: delta

or ranges:

Code

Channel
  .of( 1..22, X..Y )
  .view()

Output:

`channel.topic`

It is a channel type by which you can connect multiple processes.

Code

process proc1 {
  val x, topic: 'my_topic'

  script:
  """
  echo "Processing $x"
  """
}

process proc2 {
  val x, topic: 'my_topic'

  script:
  """
  echo "Processing $x"
  """
}

channel
  .topic('my_topic')
  .view()

`channel.value`

Creating a value channel by which you can bind the channel to a specific variable.

Code

workflow_summary    = WorkflowSpatialxe.paramsSummaryMultiqc(workflow, summary_params)
ch_workflow_summary = Channel.value(workflow_summary)

`channel.watchPath`

Check the presence of file with certain events which should be specified (create, modify, delete).

Code

Channel
   .watchPath( '/path/*.fa', 'create, modify' )
   .subscribe { println "Fasta file: $it" }

The watchPath factory waits endlessly for files that match the specified pattern and event(s), which means that it will cause your pipeline to run forever. Consider using the until operator to close the channel when a certain condition is met (e.g. receiving a file named DONE).

Operators

`mix`

emits the emited outputs of multiple channels.

Code

c1 = Channel.from( 'alpha', 'beta', 'delta' )
c2 = Channel.from( 'one', 'two', 'three' ) 
c1.mix( c2 ).view()

Workflow

`take`

Take is similar to input but when it is used the beignning of the workflow begins with main

Code

workflow {
  take: 
    myChannel
 
  main:
    myProcess(myChannel)
}

Misc

`->`

The notation -> in groovy is used to define a closure ({}) which is essentially an anonymous function (see below for further information).
Basic usage:

Code

{param1, param2 -> 
  // code here
}

{} body of the closure
param1, param2 parameters of the closure
-> separates the parameters from the body of the closure. Example:

Code

def myClosure = {param1, param2 -> 
  println "param1: $param1"
  println "param2: $param2"
}
myClosure('hello', 'world')

Output:

param1: hello
param2: world

You define a new variable in groovy by `String variable -> println “${variable}; ${variable.size()}”.

Code

Channel
    .from( 'alpha', 'beta', 'lambda' )
    .subscribe { String str ->
        println "Got: ${str}; len: ${str.size()}"
    }

`LinkedHashMap`

A LinkedHashMap is a collection that maintains the order of its elements. When you iterate over a LinkedHashMap, the elements are returned in the order they were inserted.

Code

def create_fastq_channel(LinkedHashMap row) {
    // create meta map
    def meta = [:]
    meta.id         = row.sample
    meta.single_end = row.single_end.toBoolean()

    // add path(s) of the fastq file(s) to the meta map
    def fastq_meta = []
    if (!file(row.fastq_1).exists()) {
        exit 1, "ERROR: Please check input samplesheet -> Read 1 FastQ file does not exist!\n${row.fastq_1}"
    }
    if (meta.single_end) {
        fastq_meta = [ meta, [ file(row.fastq_1) ] ]
    } else {
        if (!file(row.fastq_2).exists()) {
            exit 1, "ERROR: Please check input samplesheet -> Read 2 FastQ file does not exist!\n${row.fastq_2}"
        }
        fastq_meta = [ meta, [ file(row.fastq_1), file(row.fastq_2) ] ]
    }
    return fastq_meta
}

How to understand a nextflow pipeline?

In general nextflow pipeline is orchestrated by main.nf file. This file is the entry point of the pipeline.
and then nextflow.config file. This file contains the configuration of the pipeline.
Suggested order of the file to look:
1. main.nf: Understand the primary workflow structure.
2. nextflow.config: Learn about configurations and profiles.
3. modules: Explore the individual modules for different tasks.
4. subworkflows: Check for any reusable sub-workflows used in the main workflow.

main.nf

Initialization and Parameter Definition:

Defines parameters and initial channels.
Specifies input files, directories, and any necessary configurations.

Code

params.input = 'data/*'
params.outdir = 'results'

workflow.onComplete {
    println "Workflow completed"
}

params.input: Specifies the input data files.
params.outdir: Defines the output directory.
workflow.onComplete: A block to execute when the workflow finishes.

Processes and Workflow Execution:

Contains process definitions that encapsulate individual computational steps.
Defines the workflow that connects these processes.

Code

process process_data {
    input:
    path input_file from Channel.fromPath(params.input)
    
    output:
    path "${input_file.baseName}.processed"

    script:
    """
    process_tool -i $input_file -o ${input_file.baseName}.processed
    """
}

process_data: A process definition that takes an input file, processes it, and outputs the result.
input_file: Specify the input channel.
output: Defines the output path.
script: The shell command or script to run.
Output and Reporting:
Handles the output of results and logging.
May include final steps for summarizing results and generating reports.

Code

workflow {
    files = Channel.fromPath(params.input)
    processed_files = files | process_data
    
    processed_files.view {
        println "Processed file: $it"
    }
}

workflow: Defines the workflow structure.
files: A channel created from the input files.
processed_files: The result of processing the files through process_data.
processed_files.view: Outputs the processed files for logging.

groovy class

main.nf may refer to some functions which can be defined in a groovy class.
In Groovy, a class is a blueprint for creating objects, encapsulating data for the object, and methods to manipulate that data. Groovy classes are similar to Java classes but with some syntactic sugar that makes them easier to work with.

Code

class Person {
    String name
    int age

    void sayHello() {
        println "Hello, my name is $name and I am $age years old."
    }
}

// Creating an instance of the class
def person = new Person(name: 'John', age: 30)
person.sayHello()

Key Points - Properties: String name and int age are properties of the Person class.
- Methods: sayHello is a method that prints a greeting message.
- Instance Creation: new Person(name: 'John', age: 30) creates an instance of the Person class with specified properties.
- Method Call: person.sayHello() calls the sayHello method on the person object.

`public`

Access Modifier: Specifies that the member (field, method, or class) is accessible from any other class.

Code

public class Person {
    public String name
    public int age

    public void sayHello() {
        println "Hello, my name is $name and I am $age years old."
    }
}

`void`

Return Type: Indicates that the method does not return any value.

Code

void sayHello() {
    println "Hello"
}

`static`

Modifier: Means that the member belongs to the class, rather than instances of the class. Static members can be accessed without creating an instance of the class.

Code

static String greet() {
    return "Hello"
}

DSL2

Process

input

val

file is deprecated

path

env

stdin

tuple

each

output

path

arity

Multiple outputs

Dynamic output file names

stdout

eval

tuple

Additional output

emit

optional: true | fales

When

Directives

accelerator

beforeScript or afterScript

cache

executor

clusterOptions

queue

resourceLabel

scratch

shell

conda

container

cpu & memory

disk

time

resourceLimit

debug

errorStrategy

ext

label

maxForks

module

penv

puhlishDir

tag

Dynamic directives

Setting resources dynamically

Setting retry dynamically

Channel

Channel factories

channel.empty

channel.from

channel.fromList

channel.fromPath

channel.fromFilePairs

channel.fromSRA

channel.of

channel.topic

channel.value

channel.watchPath

Operators

mix

Workflow

take

Misc

->

LinkedHashMap

How to understand a nextflow pipeline?

main.nf

Initialization and Parameter Definition:

Processes and Workflow Execution:

Output and Reporting:

groovy class

public

void

static

`val`

`path`

`env`

`stdin`

`tuple`

`each`

`path`

`arity`

`stdout`

`eval`

`tuple`

`emit`

`optional: true | fales`

`accelerator`

`beforeScript` or `afterScript`

`cache`

`executor`

`clusterOptions`

`queue`

`resourceLabel`

`scratch`

`shell`

`conda`

`container`

`cpu` & `memory`

`disk`

`time`

`resourceLimit`

`debug`

`errorStrategy`

`ext`

`label`

`maxForks`

`module`

`penv`

`puhlishDir`

`tag`

`channel.empty`

`channel.from`

`channel.fromList`

`channel.fromPath`

`channel.fromFilePairs`

`channel.fromSRA`

`channel.of`

`channel.topic`

`channel.value`

`channel.watchPath`

`mix`

`take`

`->`

`LinkedHashMap`

`public`

`void`

`static`