Demultiplexing your NGS data is far easier than it sounds like. Do not be afraid, this is just one line of code. All you need is
- Your original raw data obtained from the sequencer.
- The bcl2fastq software from Illumina
- A correct sample sheet (more about that soon)
The command
Assuming that you have (1) your data in a directory called bcl, (2) the bcl2fastq standalone and (3) the sample sheet, which describe your experiment (yes yours!), then just type the following command:
bcl2fastq \
-p 4 \
--barcode-mismatches 0 --input-dir ./bcl/Data/Intensities/BaseCalls \
--runfolder-dir ./bcl \
--intensities-dir ./bcl/Data/Intensities \
--sample-sheet SampleSheet.csv \
--output-dir results \
--ignore-missing-controls \
--ignore-missing-bcls \
--no-bgzf-compression
For all NextSeq userse, please add this option since you want to merge the 4 lanes:
--no-lane-splitting
If you want to demultiplex your data and perform a reverse complement as well, you may want to add this option:
--write-fastq-reverse-complement
The explanation
- The -p 4 option indicate the number of threads to be used.
- --barcode-mismatches is set to 0 in general but you may want to set it to 1
The interpretation
Once the demultiplexing is over, you need to check the size of your FASTQ files in the results directory. There you should find files called Undetermined_S0_L001_R1_001.fastq.gz and your samples files. What you need to make sure is that the size of the undetermined are reasonable. If the size of your Undetermined files are large, most probably there is an error in your SampleSheet, and its needs to be corrected. Most common errors are:
- If one sample is empty, it is most probably a typo in its index
- If all samples are empty, most probably the index columns is not populated with the index, or index1 and index2 are swapped.
Institut Pasteur Only
To use the command above on the cluster, just type:
module load bcl2fastq/2.20.0
You will need to run the demultiplexing with several threads, so do not forget to use the srun or sbatch command before bcl2fastq. Moreover, some demultiplexing will requires lots of memory. In general 16Go is enough but we've seen run requiring 32 or 64Go:
sbatch -c 8 --mem 16000 --wrap "sh runme.sh"