UNI-Logo Forschung
IBIMA      Forschung   Lehre   Dienstleistungen   Aktuelles   ROBISYS
Biotool2Web
 

   Biotool2Web:

   Creating Simple Web Interfaces for Bioinformatics Applications

Mohammad Shahid, Intikhab Alam and Georg Fuellen

Introduction

Research software is often placed on an ftp site for download and installation. In this way it may fail to reach a wide audience, as it requires installation expertise, a specific operating system etc, on the part of the user. An easily accessible web interface allows users to use the software without any headache of installing it. For the authors of the software, a web interface to their system allows for easy upgrades and bug fixes as well as the opportunity to monitor usage of the system. . On the other hand, web interfaces are often slow and cumbersome, and they may pose a security risk.

Developing the web interface for a software application typically involves the use of Hyper Text Markup Language (HTML) page(s) as a front end and of a Common Gateway Interface (CGI) script as a backend that communicates with the actual application. Setting up the HTML and corresponding CGI costs some time, and it requires expertise. We developed a software package that we call Biotool2Web. Biotool2Web uses an eXtensible Markup Language (XML) document that contains the parameters needed to specify the web interface. As a result it generates the front-end HTML file and the corresponding CGI for a functional web interface. In comparison to PISE (1) that also generates web interfaces for molecular biology programs, the Biotool2Web is a lightweight and easy to use. Knowing the functionality provided by Biotool2web, the programmer of a “home-made” application can adapt the command-line interface of his or her program to Biotool2web at development time. In particular, the application should take files as input, and produce output via STDOUT (standard output).

We chose XML, a standard for structuring documents (2), as a basis for Biotool2Web due to its various advantages. XML makes it possible to define the content of a document independent of its formatting, making it easy to use that content in applications or presentation environments (3). Most importantly, XML provides a basic syntax that can be used to share information between different kinds of computers, different applications, and different organizations without the need to pass it through many layers of conversion (4). XML's set of tools allows developers to create Web pages and much more (5). In the following sections we explain the components and use of the Biotool2Web. We also explain the development of a web interface for Position Specific Iterated PHI-BLAST (6) as an example.

Biotool2Web is a Perl script that first extracts the HTML and CGI specific parameters from the XML document, using XML modules. It then creates the HTML and CGI files for the web server. Internally, Biotool2Web contains "templates" (more precisely, the script generates HTML through sequential print statements) for HTML and CGI that are modified according to the parameters present in the XML document.

An XML document that contains the developer-defined parameters

Setting up the HTML and CGI for a software application (e.g. PHI-BLAST, see below) requires some basic information. What is the file path to the software application? What type of input is required to run the tool on the web? For example a protein/DNA sequence and a pattern are required to run PHI-BLAST. How will this input be taken from a text box, or from a file upload?  Biotool2Web offers the Web interface developer to store this type of HTML and CGI specific information, under a relevant tag, in an XML document (as shown in the myapp.xml). For example, the input for the a parameter will be passed on to the application with the option “-p” as specified using the XML tag “<arg_to_pass>”. A separate XML document stores regular expressions for validating user inputs. Validation is done by matching user input with the regular expressions provided by the developer, and security depends on these expressions. Before going to the next step, the developer should make sure that all necessary files to execute the application, as described in the XML document, are installed at the web server.

Biotool2Web is a Perl script that first extracts the HTML and CGI specific parameters from the XML document, using XML modules (see box). It then creates the HTML and CGI files for the web server. Internally, Biotool2Web contains templates for HTML and CGI that are modified according to the parameters present in the XML document.

Parsers: Almost all languages have support for parsing and transforming XML documents. We use the Perl xml parsers. The choice of such parsers mostly depends on the users’ criteria how do they want to process the XML and what is the purpose of their data. The Perl-xml parser modules used in Biotool2Web are the following:
XML::Parser: This module provides ways to parse XML documents. It is built on top of XML::Parser::Expat, which is a lower level interface to James Clark's expat library. Each call to one of the parsing methods creates a new instance of XML::Parser::Expat, which is then used to parse the document.

XML::DOM: This module extends the XML::Parser module by Clark Cooper. XML::DOM::Parser is derived from XML::Parser. It parses XML strings or files and builds a data structure that conforms to the API of the Document Object Model. When an XML::DOM::Parser object is created, the parse and parse-file methods create an XML::DOM::Document object from the specified input. This Document object can then be examined, modified and written back out to a file or converted to a string.

Analysis results are viewed via an HTML file written by the application and to which the user is pointed upon completion of the application. Timeouts need to be taken care of by the application, which should write a temporary HTML page that is updated as soon as the application completes its job.

Web interface for PHI-BLAST, An Example

PHI-BLAST (Pattern Hit Initiated BLAST) is part of the Basic Local Alignment and Search Tool (BLAST) package (11). It performs homology searching in a sequence database like SCOP (12), given a Prosite-like pattern (13) and a protein sequence that contains such a pattern. We designed a Perl wrapper, for PHI-BLAST  that requires a user-defined Prosite-like pattern and a set of sequences to construct a consensus sequence that may contain this pattern. If no pattern is provided, this Perl script automatically generates a pattern from the given set of sequences using PRATT. It then executes PHI-BLAST and reports the results.

To develop the web interface for the above Perl script that runs PHI-BLAST we set up the input specific information in an XML document (as shown in myapp.xml). In this XML document we place all the required information, called parameters, in three sections (<parameter> tags). These sections contain the details of the input controls to be put on the web form, for example we specify that we want a text area and a browse button to get sequences from the user. In the next section there is the possibility of defining a combo-box where we can put various options. We also specify a text box to accept a PROSITE format pattern. The input taken from the user through an HTML web form can be passed to an application with or without an argument (defined in the tag <arg_to_pass>).

The regular expressions (stored in a library of regular expressions in a separate XML document, see regexps.xml) that will check the input data before passing them to the tool to prevent malicious data upload for security reasons. More libraries of regular expressions can be added and then utilized in the inputs XML document, therefore the users can extend the number of regexp to be used. Installation related information such as directories paths are also set in a separate configuration file (config.xml). After setting up the parameters in the XML document we call the Biotool2Web script. Biotool2Web creates the integrated HTML and CGI files ready to be published on a server.

Web interface for RunClustalW, Another Example

An example interface to ClustalW.  In this interface, the user input is taken through an edit box or file-upload and passed on to the runclustalw.pl tool through STDIN.

Conclusion

We hope that our tool is useful for the developers of small, “home-made” bioinformatics applications, and that it will be developed further in the spirit of open source, without compromising its light-weight nature. Batching analysis runs (e.g., submitting multiple sequences to be searched in a database), integration with a compute cluster (job submission, status check, rendering of results) and support for pipelines (i.e., some way to specify pre and/or post processing of analysis results) would thus need to be added such that they are completely optional, and do not interfere with the straightforward standard usage. Moreover, we hope to establish, at the Biotool2Web Website, an extensive library of regular expressions that can be used to validate many different kinds of input data relevant for bioinformatics, thereby improving web server security.
 

How to use this tool

This document describes how to use Biotool2Web to generate web interfaces and cgi scripts. Before you execute the tool, make sure the perl-xml processing modules are installed on your system. The modules needed are: XML::Parser and XML::DOM
The tool wouldn't execute if these modules are not installed.

SETTING UP THE INPUTS:

All the necessary inputs are contained in the xml documents that are needed by Biotool2Web.Before executing the tool, user must provide the required information to create the web interface. Below is given the description of the inputs.

  • CONFIG.XML: This XML document stores the paths of the desired installation directories.
  • MYAPP.XML: This XML document stores the input specific information used by Biotool2Web script.
  • REGEXP.XML: This XML document has the libregexp block, which contains patterns of regular expressions.
  • Biotool2Web offers the Web interface developer to store this type of HTML and CGI specific information, under a relevant tag, in XML documents needed by this tool.  A separate XML document stores regular expressions for validating user inputs. Before going to the next step, the developer should make sure that all necessary files to execute the application, as described in the XML document, are installed at the web server.

    The user can add as many sections of inputs (parameters) as needed but the structure of the xml document must not be modified. A tag should be kept empty if not needed except the <name> tag which is necessary.

    USAGE:

    After setting the inputs, execute the Biotool2Web from the command line:  perl bt2w.pl myapp.xml


    Example Web Interface for AUTO-PHI-BLAST Tool



    Download Biotool2Web


    Internship Student, International NRW Graduate School in Bioinformatics and Genome Research, Center of Biotechnology (CeBiTec), University of Bielefeld, 33615 Bielefeld, Germany.

    mshahid@cebitec.uni-bielefeld.de

    ** International NRW Graduate School in Bioinformatics and Genome Research,

    Center of Biotechnology (CeBiTec), University of Bielefeld, 33615 Bielefeld, Germany.

    intikhab@cebitec.uni-bielefeld.de

    *** Medizinische Fakultät, c/o Arbeitsgruppe Bioinformatik,
    Schlossplatz 4 (1st upper floor), D-48149 Muenster, Phone/Fax:+49-251-83-21637/21631

    fuellen@uni-muenster.de,  Corresponding Author.

    References

    1. Letondal C. A Web interface generator for molecular biology programs in Unix, Bioinformatics, 17(1), 2001, 73-82.
    2. Harold ER, Means WS. XML in a Nutshell, Sebastopol: O’Reill, 2004.
    3. Schafer, S. A. Towards a Generalized XML-Based System for Flexible Formatting of Text and Graphics. XML 2002 Proceedings by deepX.
    4. Bray, T., Paoli J. and Sperberg-McQueen, C. M. Extensible Markup Language (XML) 1.0. W3C Recommendation 6 October 2000. http://www.w3.org/TR/REC-xml.
    5. Khare, R. and Rifkin, A. (1997). XML: A door to automated web applications. IEEE Internet Computing. http://www.ai.univie.ac.at/~paolo/lva/vu-htmm/pdf/w4078.pdf.
    6. Zhang, Z., Schaffer, A. A., Miller, W., Madden, T. L., Lipman, D. J., Koonin, E. V. and Altschul, S. F. (1998) Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res. 26, 3986-90.
    7. Rodriquez, M. Processing XML with Perl. April 05, 2000. http://www.xml.com/pub/a/2000/04/05/feature/#xmlparser.
    8. Cooper, C. XML-Parser. A Perl module for parsing XML Documents. http://search.cpan.org/~tjmather/XML-DOM-1.43/lib/XML/DOM.pm  and http://cpan.uwinnipeg.ca/htdocs/XML-DOM/XML/DOM.html
    9. Mather, T.J. XML-DOM-1.43. A Perl module for building DOM Level 1 compliant document structures. http://search.cpan.org/~tjmather/XML-DOM-1.43/lib/XML/DOM.pm.  
    10. Apparao, V., Byrne, S., Champion, M., Isaacs, S., Jacobs, I., Hors, A.L., Nicol, G., Robie, J., Sutor, R., Wilson, C. and Wood, L. Document Object Model (DOM) Level 1 Specification. W3C Recommendation, October 1998. http://www.w3.org/TR/REC-DOM-Level-1.
    11. Altschul, S., Gish, W., Miller, W., Myers, E. W., and Lipman, D. (1990). "A Basic Local Alignment Search Tool". JMB, 215, 403-410.
    12. Murzin, A., Brenner, S.E., Hubbard, T., Chothia, C,. (1995) J. Mol. Biol. 247, 536-540.
    13. Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K. and Bairoch, A. (2002) Nucleic Acids Res. 30, 235-238.
    14. Jonassen, I. (1997) CABIOS 13, 509-522.