Connecting Infrastructure, Connecting Research

Blast ToolBox

Description

The NCBI TOOLBOX is used extensively within NCBI for the internal pipelines and tools such as GenBank, Entrez, BLAST, Sequin, OMIM, RefSeq, and others. These tools are supported in the sense that they are designed to work in many environments outside NCBI.The "Blast" utility is included in the NCBI toolbox. See in the end of this page for a complete list of the executables included in the TOOLBOX.

Site Version (Latest: Aug 2009) Notes
ngs.rl.ac.uk March 2007  

 

License

Non-commercial software, freely available to the public.

Users of this application should credit the code authors and the NGS accordingly and where applicable the specific site where the code is installed, when publishing work using NGS resources.

Running BLAST tools from the NCBI TOOLBOX on the NGS

The Blast tool box consists of many executables. They are all run in the same way on the NGS:

/usr/ngs/BLAST-NCBI-TOOLBOX <program> <arguments>

the examples below use the aadnafra.fsa DNA input file which can be found here:

gsiftp://ngs.rl.ac.uk:2811/apps/blast/examples/aadnafra.fsa

 

  • Submission Using the UI/WMS

    To submit an ABC job via the WMS, login to the UI and create a JDL file as below, replacing your ngs ID on the UI for ngsxxxx, then submit using:

    glite-wms-job-submit -o jobIDs -a abc.jdl

    Type = "Job";
    JobType = "normal";
    Executable = "/usr/ngs/BLAST-TOOLBOX-NCBI";
    Arguments = "blastall -p blastn -d est_hum -i aadnafra.fsa -a 1";
    StdOutput = "blast_toolbox_ncbi.out";
    StdError = "blast_toolbox_ncbi.err";
    Myproxyserver= "myproxy.ngs.ac.uk";
    InputSandbox = {"aadnafra.fsa"};
    InputSandboxBaseURI = "gsiftp://ngsui03.ngs.ac.uk:2811/home/ngsxxxx/blasttoolbox";
    OutputSandbox = {"blast_toolbox_ncbi.out","blast_toolbox_ncbi.err"};
    OutputSandboxBaseDestURI = "gsiftp://ngsui03.ngs.ac.uk:2811/home/ngsxxxx/blasttoolbox";
    Requirements = (
         member("NGS-UEE-BLAST-TOOLBOX-NCBI", other.GlueHostApplicationSoftwareRunTimeEnvironment)
    );
    ShallowRetryCount = -1;
    

    The library we are searching is the 'est_hum' library in BLAST format, available in the '${DB}/EBI_NUCLEOTIDE_DB/blast_DB' directory, which is the EBI_NUCLEOTIDE database in blast format.

  • NGS Web Portal Submission

    After logging in to the NGS portal, select the "Blast toolbox" template under the " Bioinformatics" category and make the amendments appropriate for your specific job. Read the description page of the template for further details.

  • Submission Using Globus

    The string 'xxx' in the subsequent commands is to be replaced with your id on the NGS site where you want to run the program.

    You can also create a directory within your home directory (named 'blast' in the example that follows), which will be the working directory.

    To submit a BLAST job for example on ngs.rl.ac.uk, use the following command in one line (the slashes are there just to show that the command is on the same line, you should NOT use them):

    globusrun -b -r ngs.rl.ac.uk/jobmanager-lsf \ '&(executable=/usr/ngs/BLAST_TOOLBOX_NCBI) \ (arguments= blastall -p blastn -d est_hum -i aadnafra.fsa -a 1) \ (jobType=single)\ (stdout=blast_toolbox_ncbi.out)(stderr=blast_toolbox_ncbi.err)'

    where in this example,

    • The working directory is set to ncbi_blast and must be previously created.
    • Stdout and stderr are named 'blast.out' and 'blast.err' respectively.
    • The program to run should be given as the FIRST argument . The input file should be given as an argument and not as stdin.
    • This runs the default version. To run specifically the 25/03/2007 version use BLAST-NCBI-TOOLBOX_25032007 instead of BLAST-NCBI-TOOLBOX.

     

    The library we are searching is the 'est_hum' library in BLAST format, available in the '${DB_PATH}/EBI_NUCLEOTIDE_DB/blast_DB' directory, which is the EBI_NUCLEOTIDE database in blast format.

     

Searching Other Databases.

Both EBI Nucleotide and EBI Protein databases are installed on all NGS sites supporting the NGS mirrored bioinformatics databases and updated regularly, as documented here. Instructions are given below for how to search different parts of the database using standard blast commands. On ngs.rl.ac.uk there are also shortcuts available.

EBI NUCLEOTIDE DATABASE

The blast environment variable $BLASTDB should already be defined as ${DB_PATH}/EBI_NUCLEOTIDE_DB/blast_DB .

The blast format EBI Nucleotide database files, are named by dataclass and taxonomic division e.g. 'est_hum'. In addition the daily updates are made available e.g. 'est_hum_upd' (See here for other '_upd' update files).

On ngs.rl.ac.uk a search can be made across all the dataclasses by defining the database name as "$est". This shortcut searches across all 'est' databases, as long as the double quotes and the $ symbol are included.

For example to search the whole of est, the arguments part of the job submission commands above, can be written as:

WMS: Arguments = "blastall -p blastn -d \"$est\"-i aadnafra.fsa -a 1"; 
Globus: (arguments= blastall -p blastn -d "$est"-i aadnafra.fsa -a 1) 

To search a subset of est (est_hum and est_mus) and the est_hum updates:

WMS: Arguments = "blastall -p blastn -d \"est_hum est_hum_upd est_mus\" -i aadnafra.fsa -a 1";
Globus: (arguments= blastall -p blastn -d "est_hum est_hum_upd est_mus" -i aadnafra.fsa -a 1) 

To search across all the quarterly release est files and all the updates:

WMS: Arguments = "blastall -p blastn -d \"$est $est_upd\" -i aadnafra.fsa -a 1";
Globus: (arguments= blastall -p blastn -d "$est $est_upd" -i aadnafra.fsa -a 1) 

For information on the available EMBL Nucleotide Sequence dataclass files in BLAST format see here.

EBI PROTEIN DATABASE

The variable $BLASTDB should be defined as either ${DB_PATH}/EBI_PROTEIN_DB/blast_DB (or on ngs.rl.ac.uk with the alias 'EBI_P').

More information about this database can be found here

YOUR OWN DATABASE
The variable BLASTDB should be defined as the full path to your database.

List of the available NCBI TOOLBOX programs:

 
asn2ff        blastcl3          errhdr       gil2bin          seqtest
asn2gb        blastclust        fa2htgs      idfetch          tbl2asn
asn2idx       blastpgp          fastacmd     impala           testcore
asn2xml       cdscan            findspl      indexpub         testobj
asndhuff      checksub          formatdb     makemat          test_regexp
asntool       copymat           formatrpsdb  makeset          testval
bl2bag.cgi    debruijn          gene2xml     megablast        vecscreen
bl2seq        demo_regexp       getfeat      ncbisort         wblast2_cs.REAL
blast         demo_regexp_grep  getmesh      nph-viewgif.cgi  wblast2.REAL
blastall      dosimple          getpub       rpsblast
blastall_old  entrcmd           getseq       seedtop

Further Information

Blast Guide
Mailing List. NCBI mailing list for BLAST news.

 

Applications Support

The NGS cannot offer scientific support for applications. However if you require further information or believe there is something wrong with the installation, please contact the NGS support centre.

Acknowledgements

Please note: When publishing work based on use of the NGS, users should acknowledge both the authors of any programs used (see the individual program web sites, or contact the authors directly) and the NGS directly using the following line:
"The authors would like to acknowledge the use of the UK National Grid Service in carrying out this work"
This line must also accompany any use of the NGS logos.