Demultiplexing your data: The SampleSheet

Context

A Sample Sheet file contains your sequencing experimental design. It is required by Illumina sequencers so, you should have it and already be aware of what it contains.

In particular, you should use IEM tool to edit your SampleSheet file on the sequencing machine.

If you have not done so or if you made a mistake, then you will most probably need this post ! Indeed you will have to perform the demultiplexing yourself.

Demultiplexing your NGS data is easy as long as your sample sheet is correct. As explained in the demultiplexing page, you need the location of your raw data, your sample and then it is one-line command.

The demultiplexing may fail for various reasons and the final message may be useless or difficult to understand. By experience, we know that 99% of the errors come from an erroneous sample sheet.

So, how should it be written ?

The Syntax

In brief, and this is the most important information for the demultiplexing stage, it contains the correspondence between your samples and the indexes that were used.

[Header]
IEMFileVersion,4
Investigator Name,Azim
Experiment Name,090819-PE-MP-B1569
Date,8/9/2019
Workflow,GenerateFASTQ
Application,FASTQ Only
Assay,Nextera XT
Description,
Chemistry,Amplicon

[Reads]
300
300

[Settings]
ReverseComplement,0
Adapter,CTGTCTCTTATACACATCT

[Data]
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,Sample_Project,Description
A,,,,N701,TAAGG,B1569,
B,,,,N701,TTTGG,B1569,
C,,,,N701,TGCGA,B1569,
D,,,,N701,GGGGA,B1569,

The first thing to notice is that: it is not a CSV file ! So do not edit it with Excel/libreoffice/microsoft tools because extra commas may be added and this will led to errors.

There are several sections. The most important is the [Data]. Others are in general informative and are ignored in this post.

So, if you have an error, it is most probably in the [Data] section, which is CSV formatted. There, columns should be consistent meaning that you should have the same number of columns on each row.

This is wrong:

[Data]
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,Sample_Project,Description
A,,,N701,TAAGG,B1569,
B,,,,N701,TTTGG,B1569,

Indeed, there is a missing comma.

If you have double-indexing, just make sure that you have the two extra columns named I5_Index_ID and index2:

[Data]
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,Index_Plate_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
1341,PDGFRaPos1,,,A01,D701,ATTACTCG,D501,AGGCTATA,B2648,
1342,PDGFRaNeg1,,,D02,D702,TCCGGAGA,D504,TCAGAGCC,B2648,
1546,PDGFRaPos2,,,A04,D704,GAGATTCC,D501,AGGCTATA,B2648,
1547,PDGFRaNeg2,,,B05,D705,ATTCAGAA,D502,GCCTCTAT,B2648,

Somes notes:

  • the order of the columns is irrelevant.
  • Informative columns, which could be left blank: Description, Sample_Well, Index_Plate_Well, Sample_Plate, I7_Index_ID, I5_Index_ID
  • Those columns are used to name your output files: Sample_ID, Sample_Name. Sample_Name is not required.

Most commons errors

Easy to find:

  • missing comma in a given row
  • Uncorrect or missing header
  • Missing Sample_ID

Difficult to find:

  • typo in an index. The demultiplexing would end correcly but a FastQ file will have no or really few reads compared to the other

links

social