,summary samples,32 paired,False sequana_fastqc_version,1.0.0
summary |
---|
sample,%GC,Filename,Sequence length,Total Sequences,avg_sequence_length,mean_quality,duplicated (%),link 10_S1,47.0,10_S1_R1_001.fastq.gz,35-76,16139966.0,75.50167497254951,33.8,54.13,samples/10_S1/10_S1_R1_001_fastqc.html 11_S2,46.0,11_S2_R1_001.fastq.gz,35-76,16027320.0,75.49514528941832,33.87,51.6,samples/11_S2/11_S2_R1_001_fastqc.html 129_S19,48.0,129_S19_R1_001.fastq.gz,35-76,16728100.0,75.50423724152773,33.87,52.66,samples/129_S19/129_S19_R1_001_fastqc.html 130_S20,50.0,130_S20_R1_001.fastq.gz,35-76,13608798.0,75.51457241117107,33.75,50.31,samples/130_S20/130_S20_R1_001_fastqc.html 131_S22,48.0,131_S22_R1_001.fastq.gz,35-76,14903425.0,75.51144626151371,33.8,54.02,samples/131_S22/131_S22_R1_001_fastqc.html 132_S21,49.0,132_S21_R1_001.fastq.gz,35-76,14582337.0,75.50636341760584,33.76,50.95,samples/132_S21/132_S21_R1_001_fastqc.html 133_S23,51.0,133_S23_R1_001.fastq.gz,35-76,20414190.0,75.48683121887275,33.72,43.0,samples/133_S23/133_S23_R1_001_fastqc.html 135_S24,48.0,135_S24_R1_001.fastq.gz,35-76,23510641.0,75.50153154054796,33.76,43.13,samples/135_S24/135_S24_R1_001_fastqc.html 136_S25,47.0,136_S25_R1_001.fastq.gz,35-76,23526065.0,75.50081643487765,33.82,47.23,samples/136_S25/136_S25_R1_001_fastqc.html 137_S26,46.0,137_S26_R1_001.fastq.gz,35-76,21659095.0,75.49695192712346,33.86,49.6,samples/137_S26/137_S26_R1_001_fastqc.html 138_S27,47.0,138_S27_R1_001.fastq.gz,35-76,14876880.0,75.47652626088266,33.82,48.87,samples/138_S27/138_S27_R1_001_fastqc.html 139_S28,46.0,139_S28_R1_001.fastq.gz,35-76,19472596.0,75.48535315989712,33.79,51.53,samples/139_S28/139_S28_R1_001_fastqc.html 13_S3,47.0,13_S3_R1_001.fastq.gz,35-76,11359780.0,75.5081550875105,33.85,55.39,samples/13_S3/13_S3_R1_001_fastqc.html 140_S29,46.0,140_S29_R1_001.fastq.gz,35-76,23019581.0,75.49418119295916,33.81,49.06,samples/140_S29/140_S29_R1_001_fastqc.html 141_S30,46.0,141_S30_R1_001.fastq.gz,35-76,19014537.0,75.4932312577477,33.87,51.21,samples/141_S30/141_S30_R1_001_fastqc.html 143_S31,47.0,143_S31_R1_001.fastq.gz,35-76,21995753.0,75.49352449993415,33.78,47.98,samples/143_S31/143_S31_R1_001_fastqc.html 144_S32,45.0,144_S32_R1_001.fastq.gz,35-76,18611965.0,75.49569430202561,33.88,49.45,samples/144_S32/144_S32_R1_001_fastqc.html 14_S4,46.0,14_S4_R1_001.fastq.gz,35-76,15649769.0,75.49645742374855,33.85,48.11,samples/14_S4/14_S4_R1_001_fastqc.html 16_S5,49.0,16_S5_R1_001.fastq.gz,35-76,13370573.0,75.51756278508034,33.76,54.2,samples/16_S5/16_S5_R1_001_fastqc.html 17_S6,49.0,17_S6_R1_001.fastq.gz,35-76,10971135.0,75.51418417511042,33.83,48.26,samples/17_S6/17_S6_R1_001_fastqc.html 19_S7,51.0,19_S7_R1_001.fastq.gz,35-76,12837842.0,75.50764131541735,33.76,49.98,samples/19_S7/19_S7_R1_001_fastqc.html 20_S8,46.0,20_S8_R1_001.fastq.gz,35-76,13118404.0,75.50079262690797,33.84,49.31,samples/20_S8/20_S8_R1_001_fastqc.html 22_S9,47.0,22_S9_R1_001.fastq.gz,35-76,14694501.0,75.50362901060744,33.87,36.8,samples/22_S9/22_S9_R1_001_fastqc.html 23_S10,47.0,23_S10_R1_001.fastq.gz,35-76,13827484.0,75.49900683305799,33.81,48.81,samples/23_S10/23_S10_R1_001_fastqc.html 61_S11,47.0,61_S11_R1_001.fastq.gz,35-76,11102568.0,75.5015572973748,33.81,55.66,samples/61_S11/61_S11_R1_001_fastqc.html 62_S12,46.0,62_S12_R1_001.fastq.gz,35-76,19638776.0,75.50136571647846,33.89,49.76,samples/62_S12/62_S12_R1_001_fastqc.html 66_S13,49.0,66_S13_R1_001.fastq.gz,35-76,14071138.0,75.51186378813142,33.8,47.4,samples/66_S13/66_S13_R1_001_fastqc.html 67_S14,46.0,67_S14_R1_001.fastq.gz,35-76,14325339.0,75.50342543377158,33.86,52.16,samples/67_S14/67_S14_R1_001_fastqc.html 71_S15,49.0,71_S15_R1_001.fastq.gz,35-76,13045234.0,75.50218317279706,33.75,51.63,samples/71_S15/71_S15_R1_001_fastqc.html 72_S16,46.0,72_S16_R1_001.fastq.gz,35-76,19732680.0,75.49841790370087,33.89,45.11,samples/72_S16/72_S16_R1_001_fastqc.html 76_S17,47.0,76_S17_R1_001.fastq.gz,35-76,19023847.0,75.49178801742886,33.87,46.48,samples/76_S17/76_S17_R1_001_fastqc.html 77_S18,46.0,77_S18_R1_001.fastq.gz,35-76,15869731.0,75.48976457130874,33.77,52.0,samples/77_S18/77_S18_R1_001_fastqc.html
sample | %GC | Filename | Sequence length | Total Sequences | avg_sequence_length | mean_quality | duplicated (%) | link |
---|
│ │ ├─ 10_S1_R1_001_fastqc.html
│ │ ├─ 11_S2_R1_001_fastqc.html
│ │ ├─ 129_S19_R1_001_fastqc.html
│ │ ├─ 130_S20_R1_001_fastqc.html
│ │ ├─ 131_S22_R1_001_fastqc.html
│ │ ├─ 132_S21_R1_001_fastqc.html
│ │ ├─ 133_S23_R1_001_fastqc.html
│ │ ├─ 135_S24_R1_001_fastqc.html
│ │ ├─ 136_S25_R1_001_fastqc.html
│ │ ├─ 137_S26_R1_001_fastqc.html
│ │ ├─ 138_S27_R1_001_fastqc.html
│ │ ├─ 139_S28_R1_001_fastqc.html
│ │ ├─ 13_S3_R1_001_fastqc.html
│ │ ├─ 140_S29_R1_001_fastqc.html
│ │ ├─ 141_S30_R1_001_fastqc.html
│ │ ├─ 143_S31_R1_001_fastqc.html
│ │ ├─ 144_S32_R1_001_fastqc.html
│ │ ├─ 14_S4_R1_001_fastqc.html
│ │ ├─ 16_S5_R1_001_fastqc.html
│ │ ├─ 17_S6_R1_001_fastqc.html
│ │ ├─ 19_S7_R1_001_fastqc.html
│ │ ├─ 20_S8_R1_001_fastqc.html
│ │ ├─ 22_S9_R1_001_fastqc.html
│ │ ├─ 23_S10_R1_001_fastqc.html
│ │ ├─ 61_S11_R1_001_fastqc.html
│ │ ├─ 62_S12_R1_001_fastqc.html
│ │ ├─ 66_S13_R1_001_fastqc.html
│ │ ├─ 67_S14_R1_001_fastqc.html
│ │ ├─ 71_S15_R1_001_fastqc.html
│ │ ├─ 72_S16_R1_001_fastqc.html
│ │ ├─ 76_S17_R1_001_fastqc.html
│ ├─ 77_S18_R1_001_fastqc.html
The following network shows the workflow of the pipeline. Blue boxes are clickable and redirect to dedicated reports.
The analysis was performed with the following Snakemake and configfile:
"""Multi fastqc pipeline
Author: Thomas Cokelaer
Affiliation: Institut Pasteur @ 2019
This pipeline is part of Sequana software (sequana.readthedocs.io)
"""
import sequana
from sequana import snaketools as sm
# This must be defined before the include
configfile: "config.yaml"
# Generic include of some dynamic modules
exec(open(sequana.modules["fastqc_dynamic"], "r").read())
# A convenient manager
manager = sm.PipelineManager("fastqc", config)
manager.setup(globals(), mode="error")
rule pipeline:
input: "multiqc/multiqc_report.html", ".sequana/rulegraph.svg", "summary.png"
# FASTQC on input data set
__fastqc_samples__input_fastq = manager.getrawdata()
__fastqc_samples__output_done = "samples/{sample}/{sample}.done"
__fastqc_samples__wkdir = "samples/{sample}" # manager.getwkdir("fastqc_samples")
__fastqc_samples__log = "samples/%s/fastqc.log" % manager.sample
include: fastqc_dynamic("samples", manager)
comments = """Number of samples: {}
Paired data: {}
Browse files here:
tree """.format(
len(manager.samples.keys()) , manager.paired)
from sequana_pipelines.fastqc import version as v2
from sequana import version as v1
comments += """
Sequana version: {}""".format(v1)
comments += """
Sequana_fastqc version: {}
""".format(v2)
# Multiqc rule
__multiqc2__input = expand(__fastqc_samples__output_done, sample=manager.samples)
__multiqc2__logs = "multiqc/multiqc.log"
__multiqc2__output = "multiqc/multiqc_report.html"
__multiqc2__indir = config['multiqc']['indir']
__multiqc2__outdir = "multiqc"
__multiqc2__config = "multiqc_config.yaml"
# do not specify fastqc itself alone, otherwise it fails (feb 2020)
__multiqc2__modules = ""
config['multiqc']['options'] = "-m fastqc " + config["multiqc"]["options"].replace("-f", " ") + \
" --comment \"{}\" ".format(comments)
include: sm.modules["multiqc2"]
__rulegraph__input = manager.snakefile
__rulegraph__output = ".sequana/rulegraph.svg"
__rulegraph__mapper = {"multiqc2":"multiqc/multiqc_report.html"}
include: sm.modules['rulegraph']
localrules: rulegraph
rule plotting_and_stats:
input: expand(__fastqc_samples__output_done, sample=manager.samples)
output: "summary.png", "summary.json"
run:
import glob
from sequana.fastqc import FastQC
from sequana.summary import Summary
from sequana_pipelines.fastqc import version
summary = Summary("fastqc", caller="sequana_fastqc", sample_name="multi samples")
summary.description = "summary sequana_fastqc pipeline"
summary.pipeline_version = version
filenames = glob.glob("samples/*/*.zip")
f = FastQC()
for sample in manager.samples:
filenames = glob.glob("samples/{}/*zip".format(sample))
filenames = sorted(filenames)
assert len(filenames) in [0, 1,2]
if len(filenames) != 0:
f.read_sample(filenames[0], sample)
summary.data[sample] = f.fastqc_data[sample]['basic_statistics']
else:
summary.data[sample] = {
'Filename': 'No fastqc found',
'File type': 'Conventional base calls',
'Encoding': 'Sanger / Illumina 1.9',
'Total Sequences': 0,
'Sequences flagged as poor quality': 0.0,
'Sequence length': '0', '%GC': 0, 'total_deduplicated_percentage': 0,
'mean_quality': 0, 'avg_sequence_length': 0}
summary.to_json("summary.json")
f.plot_sequence_quality()
from pylab import savefig, gcf
f = gcf()
f.set_size_inches(10,6)
savefig(output[0], dpi=200)
# Those rules takes a couple of seconds so no need for a cluster
localrules: multiqc2, rulegraph
onsuccess:
#shell("ln -f -s {} index.html".format(__multiqc2__output))
shell("rm -f ./samples/*/*.done")
shell("rm -f ./samples/*/*.log")
shell("chmod -R g+w .")
# Create the tree.html file with all fastqc reports
from sequana.utils.tree import HTMLDirectory
hh = HTMLDirectory(".", pattern="fastqc.html")
with open("tree.html", "w") as fout:
fout.write(hh.get_html())
from sequana import logger
logger.level = "INFO"
# This should create the stats plot and the Makefile
manager.teardown()
manager.clean_multiqc(__multiqc2__output)
# Now, the main HTML report
import pandas as pd
from sequana.utils.datatables_js import DataTable
import json
# Summary table with links towards fastqc
data = json.load(open("summary.json", "r"))
df = pd.DataFrame(data['data'])
df = df.T
df.drop(['File type', "Encoding", "Sequences flagged as poor quality"],
axis=1, inplace=True)
df['mean_quality'] = [int(float(x)*100)/100 for x in df['mean_quality']]
df['total_deduplicated_percentage'] = [int(float(x)*100)/100 for x in df['total_deduplicated_percentage']]
df = df.reset_index()
df = df.rename({
"index": "sample",
"total_deduplicated_percentage": "duplicated (%)"}, axis=1)
df['link'] = ["samples/{}/{}_R1_001_fastqc.html".format(sample, sample) for sample in df['sample']]
datatable = DataTable(df, 'fastqc', index=False)
datatable.datatable.datatable_options = {'paging': 'false',
'buttons': ['copy', 'csv'],
'bSort': 'true',
'dom':"BRSPfrti"
}
datatable.datatable.set_links_to_column('link', 'sample')
js = datatable.create_javascript_function()
htmltable = datatable.create_datatable()
# The summary table at the top
from sequana_pipelines.fastqc import version as vv
df_general = pd.DataFrame({
"samples": len(manager.samples),
"paired": manager.paired,
"sequana_fastqc_version": vv}, index=["summary"])
datatable = DataTable(df_general.T, 'general', index=True)
datatable.datatable.datatable_options = {'paging': 'false',
'bFilter': 'false',
'bInfo': 'false',
'header': 'false',
'bSort': 'true'}
js2 = datatable.create_javascript_function()
htmltable2 = datatable.create_datatable(style="width: 20%; float:left" )
from sequana.modules_report.summary import SummaryModule2
data = {
"name": manager.name,
"rulegraph": __rulegraph__output,
"stats": "stats.txt"
}
# Here the is main HTML page report
contents = " General information
"
contents += """{}""".format(js2 + htmltable2)
image = SummaryModule2.png_to_embedded_png("dummy", "summary.png",
style="width:80%; height:40%")
contents += 'The following image shows the overall quality of your samples (R1 only).
{}'.format(image)
# the main table
contents += """"""
contents += "
Here is a summary for all the samples. The CSV button allows you to export the basic statistics. {}".format(js + htmltable)
contents += """
Please look at the multiqc report for more details about your run."""
contents += """ Individual fastqc HTML reports for each sample
"""
contents += hh.get_html()
s = SummaryModule2(data, intro=contents)
shell("rm -rf rulegraph") # embedded in report
shell("rm -rf summary.png") # embedded in report
onerror:
print("An error occurred. See message above.")
# ============================================================================
# Config file for Quality Control
# ==========================================[ Sections for the users ]========
#
# One of input_directory, input_pattern and input_samples must be provided
# If input_directory provided, use it otherwise if input_pattern provided,
# use it, otherwise use input_samples.
# ============================================================================
input_directory: /pasteur/projets/specific/Biomics/Data/current/NextSeq/200804_NB501291_0260_AHTT55BGXF/fastq/B4122
input_readtag: _R[12]_
input_pattern: '*fastq.gz'
##############################################################################
# FastQC section
#
# :Parameters:
#
# - options: string with any valid FastQC options
fastqc:
do_group: true
options: ''
threads: 4
##############################################################################
#
#
# - options: any multiqc options accepted. Note that if you use --comments,
# it will be appended to the existing --comments added inside sequana.
# By default, -p (create pictures) and -f (for overwritting) are used.
# - indir: The input multiqc (default is local).
multiqc:
options: -p -f
indir: .
Dependencies downloaded from bioconda requirements
Python dependencies (Pypi)
package,version,link appdirs,1.4.3,https://pypi.python.org/pypi/appdirs atropos,1.1.24,https://pypi.python.org/pypi/atropos attrs,19.3.0,https://pypi.python.org/pypi/attrs backcall,0.1.0,https://pypi.python.org/pypi/backcall beautifulsoup4,4.8.1,https://pypi.python.org/pypi/beautifulsoup4 bioservices,1.7.7,https://pypi.python.org/pypi/bioservices bx-python,0.8.8,https://pypi.python.org/pypi/bx-python certifi,2019.11.28,https://pypi.python.org/pypi/certifi chardet,3.0.4,https://pypi.python.org/pypi/chardet Click,7.0,https://pypi.python.org/pypi/Click colorama,0.4.1,https://pypi.python.org/pypi/colorama coloredlogs,10.0,https://pypi.python.org/pypi/coloredlogs colorlog,4.0.2,https://pypi.python.org/pypi/colorlog colormap,1.0.3,https://pypi.python.org/pypi/colormap colormath,3.0.0,https://pypi.python.org/pypi/colormath ConfigArgParse,0.15.1,https://pypi.python.org/pypi/ConfigArgParse cycler,0.10.0,https://pypi.python.org/pypi/cycler Cython,0.29.14,https://pypi.python.org/pypi/Cython datrie,0.8,https://pypi.python.org/pypi/datrie decorator,4.4.1,https://pypi.python.org/pypi/decorator docopt,0.6.2,https://pypi.python.org/pypi/docopt docutils,0.15.2,https://pypi.python.org/pypi/docutils easydev,0.9.38,https://pypi.python.org/pypi/easydev future,0.18.2,https://pypi.python.org/pypi/future gevent,1.4.0,https://pypi.python.org/pypi/gevent gitdb2,2.0.6,https://pypi.python.org/pypi/gitdb2 GitPython,3.0.5,https://pypi.python.org/pypi/GitPython greenlet,0.4.15,https://pypi.python.org/pypi/greenlet grequests,0.4.0,https://pypi.python.org/pypi/grequests gseapy,0.9.18,https://pypi.python.org/pypi/gseapy humanfriendly,4.18,https://pypi.python.org/pypi/humanfriendly idna,2.8,https://pypi.python.org/pypi/idna importlib-metadata,0.23,https://pypi.python.org/pypi/importlib-metadata ipykernel,5.1.3,https://pypi.python.org/pypi/ipykernel ipython,7.10.0,https://pypi.python.org/pypi/ipython ipython-genutils,0.2.0,https://pypi.python.org/pypi/ipython-genutils itolapi,3.0.3,https://pypi.python.org/pypi/itolapi jedi,0.15.1,https://pypi.python.org/pypi/jedi Jinja2,2.10.3,https://pypi.python.org/pypi/Jinja2 joblib,0.14.1,https://pypi.python.org/pypi/joblib jsonschema,3.2.0,https://pypi.python.org/pypi/jsonschema jupyter-client,5.3.3,https://pypi.python.org/pypi/jupyter-client jupyter-core,4.6.1,https://pypi.python.org/pypi/jupyter-core kiwisolver,1.1.0,https://pypi.python.org/pypi/kiwisolver lxml,4.4.2,https://pypi.python.org/pypi/lxml lzstring,1.0.4,https://pypi.python.org/pypi/lzstring Markdown,3.1.1,https://pypi.python.org/pypi/Markdown MarkupSafe,1.1.1,https://pypi.python.org/pypi/MarkupSafe matplotlib,2.2.2,https://pypi.python.org/pypi/matplotlib matplotlib-venn,0.11.5,https://pypi.python.org/pypi/matplotlib-venn mock,3.0.5,https://pypi.python.org/pypi/mock more-itertools,7.2.0,https://pypi.python.org/pypi/more-itertools multiqc,1.8.dev0,https://pypi.python.org/pypi/multiqc networkx,2.4,https://pypi.python.org/pypi/networkx numpy,1.17.3,https://pypi.python.org/pypi/numpy packaging,19.2,https://pypi.python.org/pypi/packaging pandas,0.25.3,https://pypi.python.org/pypi/pandas parso,0.5.1,https://pypi.python.org/pypi/parso patsy,0.5.1,https://pypi.python.org/pypi/patsy pexpect,4.7.0,https://pypi.python.org/pypi/pexpect pickleshare,0.7.5,https://pypi.python.org/pypi/pickleshare prompt-toolkit,3.0.0,https://pypi.python.org/pypi/prompt-toolkit psutil,5.6.7,https://pypi.python.org/pypi/psutil ptyprocess,0.6.0,https://pypi.python.org/pypi/ptyprocess Pygments,2.5.1,https://pypi.python.org/pypi/Pygments pykwalify,1.6.0,https://pypi.python.org/pypi/pykwalify PyOpenGL,3.1.5,https://pypi.python.org/pypi/PyOpenGL pyparsing,2.4.5,https://pypi.python.org/pypi/pyparsing pyrsistent,0.15.6,https://pypi.python.org/pypi/pyrsistent pysam,0.15.3,https://pypi.python.org/pypi/pysam python-dateutil,2.8.1,https://pypi.python.org/pypi/python-dateutil pytz,2019.3,https://pypi.python.org/pypi/pytz PyVCF,0.6.8,https://pypi.python.org/pypi/PyVCF PyYAML,5.1.2,https://pypi.python.org/pypi/PyYAML pyzmq,18.1.1,https://pypi.python.org/pypi/pyzmq qtconsole,4.6.0,https://pypi.python.org/pypi/qtconsole ratelimiter,1.2.0.post0,https://pypi.python.org/pypi/ratelimiter requests,2.22.0,https://pypi.python.org/pypi/requests requests-cache,0.5.0,https://pypi.python.org/pypi/requests-cache ruamel.yaml,0.16.5,https://pypi.python.org/pypi/ruamel.yaml ruamel.yaml.clib,0.2.0,https://pypi.python.org/pypi/ruamel.yaml.clib scikit-learn,0.23.1,https://pypi.python.org/pypi/scikit-learn scipy,1.3.2,https://pypi.python.org/pypi/scipy sequana,0.9.0,https://pypi.python.org/pypi/sequana setuptools,42.0.1.post20191125,https://pypi.python.org/pypi/setuptools simplejson,3.17.0,https://pypi.python.org/pypi/simplejson six,1.13.0,https://pypi.python.org/pypi/six smmap2,2.0.5,https://pypi.python.org/pypi/smmap2 snakemake,5.8.1,https://pypi.python.org/pypi/snakemake soupsieve,1.9.4,https://pypi.python.org/pypi/soupsieve spectra,0.0.11,https://pypi.python.org/pypi/spectra statsmodels,0.11.1,https://pypi.python.org/pypi/statsmodels suds-jurko,0.6,https://pypi.python.org/pypi/suds-jurko threadpoolctl,2.1.0,https://pypi.python.org/pypi/threadpoolctl tornado,6.0.3,https://pypi.python.org/pypi/tornado traitlets,4.3.3,https://pypi.python.org/pypi/traitlets urllib3,1.25.7,https://pypi.python.org/pypi/urllib3 wcwidth,0.1.7,https://pypi.python.org/pypi/wcwidth wrapt,1.11.2,https://pypi.python.org/pypi/wrapt xlrd,1.2.0,https://pypi.python.org/pypi/xlrd XML2Dict,0.2.2,https://pypi.python.org/pypi/XML2Dict xmltodict,0.12.0,https://pypi.python.org/pypi/xmltodict zipp,0.6.0,https://pypi.python.org/pypi/zipp
package | version | link |
---|