The GPRO Suite is a Client Side & Server Side solution where applications like RNASeq, VariantSeq, DeNovoSeq and STATools (i.e. the Client Side apps) provide tailor made Graphical User Interfaces (GUI) to manage a series of workflows and pipelines installed in a bioinformatic server infrastructure we call the Server Side.
The GPRO Server Side has the following requirements and dependencies:
Pipeline Step | Tools | RNASeq | DeNovoSeq | VariantSeq | License |
---|---|---|---|---|---|
Quality analysis and Preprocessing |
FastQC [v0.11.5] (Andrews 2016) |
✓ | ✓ | ✓ | GPLv3 |
FastqMidCleaner [1.0.0] |
✓ | ✓ | ✓ | GPLv3 | |
Cutadapt [1.18] (Martin 2011) |
✓ | ✓ | ✓ | MIT | |
Prinseq [PRINSEQ-lite 0.20.4] (Schmieder and Edwards 2011) |
✓ | ✓ | ✓ | GPLv3 | |
Trimmomatic [0.36] (Bolger, et al. 2014) |
✓ | ✓ | ✓ | GPLv3 | |
FastxToolkit [0.0.13] (Hannon Lab 2016) |
✓ | ✓ | ✓ | AGPLv3 | |
CANU (Koren, et al. 2017) |
No | ✓ | No | GPLv3 | |
FastqCollapser [1.0.0] |
✓ | No | ✓ | GPLv3 | |
FastqIntersect [1.0.0] |
✓ | No | ✓ | GPLv3 | |
Mapping on reference genome or transcriptome |
TopHat [v2.1.1] (Kim et al. 2013) |
✓ | No | ✓ | Boost Software 1.0 |
Hisat2 [2.2.1] (Kim et al. 2015) |
✓ | No | ✓ | GPLv3 | |
Bowtie2 [2.2.9] (Langmead and Salzberg 2012) |
✓ | No | ✓ | GPLv3 | |
BWA [0.7.15-r1140] (Li and Durbin 2009) |
✓ | No | ✓ | GPLv3 | |
STAR [2.7.0f] (Dobin et al. 2013) |
No | No | ✓ | MIT | |
Quantification |
Corset [1.06] (Davidson and Oshlack 2017) |
✓ | No | No | BSD 2-Clause |
Htseq [0.12.4] (Anders 2015) |
✓ | No | No | GPLv3 | |
Post Processing |
Bed Tools [v2.29.2] (Quinlan and Hall 2010) |
No | No | ✓ | GPLv2 |
GATK [v4.1.2.0] (MacKena et al. 2010;Cibulskis et al. 2013;DePristo et al. 2011) |
No | No | ✓ | Apache-2.0 | |
Picard [2.19.0] (Wysoker et al. 2011) |
No | No | ✓ | MIT | |
SAMtools [1.8] (Li et al. 2009) |
No | No | ✓ | MIT | |
Transcriptome assembly |
Cufflinks [v2.2.1](Trapnell et al. 2012) |
✓ | No | No | MIT |
Oases (Schulz et al. 2012) |
No | ✓ | No | GPLv3 | |
SOAPdenovo-trans (Xie et al. 2014) |
No | ✓ | No | GPLv3 | |
Genome assembly |
Velvet (Zerbino and Birney 2008) |
No | ✓ | No | GPLv3 |
SOAPdenovo2 (Luo et al. 2012) |
No | ✓ | No | GPLv3 | |
CANU (Koren et al. 2017) |
No | ✓ | No | GPLv2 | |
SPAdes (Bankevich et al. 2012) |
No | ✓ | No | GPLv3 | |
Gap filling and scaffolding |
Gap closer (Luo et al. 2012) |
No | ✓ | No | GPLv3 |
BESST (Sahlin et al. 2014) |
No | ✓ | No | GPLv3 | |
OPERA (Gao et al. 2011) |
No | ✓ | No | MIT | |
Differential expression |
DESeq [2.1.28] (Love et al. 2014) |
✓ | No | No | GPLv3 |
EdgeR [3.30.3] (Robinson et al. 2010) |
✓ | No | No | GPLv2 | |
Cuffdiff [v2.2.1] (Trapnell et al. 2012) |
✓ | No | No | Boost Software 1.0 | |
CummeRbund [2.30.0] (Goff et al. 2013) |
✓ | No | No | Artistic-2.0 | |
Enrichment Analysis |
GOseq [1.40.0] (Young et al. 2010) |
✓ | No | No | LGPLv3 |
Gene Prediction |
Augustus (Stanke et al. 2008) |
No | ✓ | No | Artistic-1.0 |
Gene Annotation |
BLAST (Altschul et al. 1990) |
No | ✓ | No | Public Domain |
HMMER (Mistry et al. 2013) |
No | ✓ | No | EASEL | |
Variant Effect Predictor [105.0] (McLaren et al. 2016) |
No | No | ✓ | Apache-2.0 | |
Training Sets |
GATK [v4.1.2.0] (MacKena et al. 2010;Cibulskis et al. 2013;DePristo et al. 2011) |
No | No | ✓ | Apache-2.0 |
Variant Calling |
GATK [v4.1.2.0] (MacKena et al. 2010;Cibulskis et al. 2013;DePristo et al. 2011) |
No | No | ✓ | Apache-2.0 |
VarScan2 [v2.4.3] (Koboldt et al. 2012) |
No | No | ✓ | VarScan | |
Variant filtering |
GATK [v4.1.2.0] (MacKena et al. 2010;Cibulskis et al. 2013;DePristo et al. 2011) |
No | No | ✓ | Apache-2.0 |
The GPRO Server Side can be installed in remote servers or in the user PC provided that if it has enough disk space and RAM (at least 500 Gb of hard disk and 16Gb of RAM). Installation of the GPRO Server Side involves complex steps to setup Linux, Apache, MySQL, and PHP (LAMP stack) as well as manual installation of the distinct third-party CLI software shown in Table 1 as well as GPRO databases, and multiple distinct scripts needed to handle requests of client applications to the Server Side. However we have deployed it into a docker container(Merkel 2014) to facilitate its installation. An image of this docker can be downloaded from: https://hub.docker.com/r/biotechvana/gpro. The current version of this docker is limited to one or two users, but we are committed to release a forthcoming version for multiple users.
To install the GPRO Server Side docker, please proceed as follows
First, download the Docker Desktop software from https://www.docker.com/products/docker-desktop and install it.
If you want to install the Server Side docker with default user name and password, please run the following command on the terminal of the Docker Desktop:
local_path="/path/to/local_home" > docker run -d -p 80:80 -p 20-22:20-22 -p 65500-65515:65500-65515 -v /path/to/local_home:/home/gpro_user biotechvana/gpro |
In doing so, the Server Side will consider the word g_user as your default user name and password. In other words, your user name will be g_user and your password will be g_user, too.
Otherwise, if you are interested in running the docker with your own user name and password, please run the following command on the Docker Desktop terminal:
local_path="/path/to/local_home" GPRO_USER="myUserName" GPRO_USER_PASS="myUserNamePass" > docker run -d -p 80:80 -p 20-22:20-22 -p 65500-65515:65500-65515 -v /path/to/local_home:/home/gpro_user biotechvana/gpro |
For example, if you chose DirkGently as user name and HolisticDetective as password then you will name to run the command as follows:
local_path="/path/to/local_home" GPRO_USER="DirkGently" GPRO_USER_PASS="HolisticDetective" > docker run -d -p 80:80 -p 20-22:20-22 -p 65500-65515:65500-65515 -v /path/to/local_home:/home/gpro_user biotechvana/gpro |
All third-party command line interface software are integrated in the server-side docker image excepting Varscan2. This is because the license for VarScan2 usage distinguishes between academic users and commercial users. Academic users can use VarScan2 without restrictions for academic purposes, while commercial users need to contact the VarScan2 authors to get the corresponding commercial license (for more details see https://github.com/dkoboldt/varscan/releases ). Taking this into consideration, the server-side image does not include VarScan2 and this tool must be integrated by the own user into the running container. The rest of CLI software dependencies used by “RNAseq” and “VariantSeq” are already installed in the docker image and have licenses of use allowing unrestricted use for every kind of author (if they are appropriately cited and accredited). To integrate VarScan2 in the docker please proceed as follows.
To manage the image you need to get image name first if you did not set it in the run command, you can get the image name by running docker ps
$ docker ps
################## Sample output
# CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# 5121b762b39f d82c97fcc2c7 "/gpro_init" 2 hours ago Up 2 hours 0.0.0.0:20-22->20-22/tcp, :::20-22->20-22/tcp, 0.0.0.0:80->80/tcp, :::80->80/tcp,
0.0.0.0:65500-65515->65500-65515/tcp, :::65500-65515->65500-65515/tcp hungry_boyd
Here the name is hungry_boyd
Adding VarScan to the gpro server : run the following command:
$ docker exec hungry_boyd gpro install varscan
Commercial users should proceed in the same way but must contact first the VarScan2 authors to get the license if they want to comply with the VarScan2 terms of use.
After installed and running the server side image in the Docker Desktop you must link your GPRO application of interest to the Server Side in order to run the server side analyses, pipelines or workflows. As previously said, the Client Side applications of the Suite that are dependent of the Server Side are RNAseq , VariantSeq , DeNovoSeq (manual in preparation) and STATools (manual in preparation). Please visit their respective manuals for detailed instructions about how to link each application with the Server Side and automatically run the docker desktop each time you open the application linked to the Server Side
This work was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 642095 for the OPATHY consortium, by the pre-doctoral research fellowship from Industrial Doctorates of MINECO (Grant 659 DI-17-09134); by the State Plan for Scientific and Technical Research and Innovation 2017-2020 under the Grant TSI-100903-2019-11 from the Secretary of State for Digital Advancement from Ministry of Economic Affairs and Digital Transformation, Spain; and by Expedient IDI-2021-158274-a from Ministry of Science and Innovation, Spain