UNI-Logo Forschung
IBIMA      Forschung   Lehre   Dienstleistungen   Aktuelles   ROBISYS

 
Structural and functional analysis of human zinc finger gene clusters

Human Genome Computing

Recently in the framework of the human genome sequencing effort it has been significantly shown that analytical and design tasks in modern molecular biology could not be delivered without appropriate hardware and software ressources and special expertise in biomedical computing (generation of ETS [Adams et al.]).

More generally, the development of expert systems for human genome computing requires a wide range of methods and intelligent tools, such as:


Access to public databases available via INTERNET

Work on detection of structure-function relationship will be based on and should exploit the rapidly growing contents of the major protein and nucleotide databases (EMBL, SWISS-PROT, TRANSFAC, TIGR, PROSITE etc.) relevant for the reseach topics addressed in projects I and II. Fortunately the IGD-project [Ritter et al.] provides a common view on the most prominent databases. It is, therefore, the appropiate tool to extract selected facts on the topic at hand. Moreover IGD is based on a common (meta-) database structure, developed for the ACEDB (A.C. elegans Database). One advantage of ACEDB is that public data and own site specific experimentally investigated data can be combined in a common database. The database structure of ACEDB, containing a class of concept and generic attributes, seems sufficiently general to be utilized by human genome projects.


Visual presentation

Visualisation of sequences, maps, binding sites and 3-D conformations is a powerful method for elicitation of knowledge within the reseach process. There is a large amount of hardware and software to support this task (INSIGHT, RASMOL, MOLSCRIPT etc.) In particular, the visualisation of DNA binding could be very helpful in our project.


Basic methods of human genome computing

Basic methods of human genome computing support searches in databases, sequence comparisons, analysis of physical mapping data, assemblies of DNA sequences, detection of functional DNA target sites and the prediction of gene functions. Research on molecular biological topics needs software to search for similar protein or nucleotide sequences. There exists a tremendous amount of a public domain or commercially available software which could be used in our project [Bishop] .
Usually computer programs to detect regions of specific biological functions, e.g. coding or non-coding region, promotor or enhancer regions, rely heavily on statistical dependencies [Kondrakhin et al.], [Buchner]. The most advanced computer based methods modeling protein-DNA binding aim at the determination of a pattern that reliably predict binding activity of different binding sites. For instance [Stromo] uses a matrix of patterns derived from statistics instead of a consensus sequence.


Intelligent systems and expert systems

Expert systems are based on a computer technology which is utilizied in many domains in industry and service. It plays a key role in the enhancement of production and service processes. Expert systems are characterized by the accumulation and codification of knowledge to provide high-level expertise for end-users [Waterman]. In biotechnology expert systems are poorly established. Some research systems have been designed and implemented. For instance, ARIADNE [Lathrop et al.] and the system in [Brugge, Buchanan] deduce protein conformations from primary sequences. Our group deveoped an expert system for prediction of protein membrane binding. [Müller et al.]. From the work conducted on expert systems in biotechnology so far it seems likely that this technology could play a similar key role in human computing, as in other domains.

Besides this symbolic method sub-symbolic intelligent systems have been developed. Recently artificial neuronal networks have been applied in human genome computing. In GRAIL some statistical methods ([Fickett] etc.) are combined to detect coding regions via a neuronal network. Despite the impressive success of this architecture there remain some severe shortcomings: To enable data to be used by a neuronal network a remakable amount of adaptation is necessary. Moreover neuronal networks provide no incremental learning in general. All examples had to be at hand. Otherwise the performance is poor. Neuronal networks provide methods for classification tasks. They do not perform well on design tasks which is one of our aims.

A more promising approach seems to be inductive machine learning in molecular biology. These methods try to automate the problem of building biological knowledge (e.g. consences sequences, coding regions) from positive and negative examples of a biological compound (e.g. transcription factors). They use biological background knowledge to guide the knowledge generation process. Most of the work in the last ten years utilizing these methods is done on protein folding and molecular design tasks [Schulze-Kremer, King], [Bolis et al.], [King], [Hayes-Roth et al.], [Friedland, Kedes]. All of them are research systems.

Most indictive machine learning methods, however, require a clear statement whether an example e.g. of a zinc finger protein interacts with specific nucleic acid sequences or, vice versa, if a zinc finger protein does not specifically recognize particular sequences (see Project II: This example reflects the experimental observation that DNA binding proteins bind to nucleic acids in general. DNA binding sites that display high affinities in interaction with transcription factors classify to be potent target sites, DNA binding sites with low affinities do obviously not). But how do DNA binding sites qualify that display medium affinities? This is exactly the problem to be solved.

Moreover most inductive learning methods are not incremental rather they need all training examples at the beginning of the training process which generates the knowledge base. The usual research process is characterized, however, by a detection-analysis-knowledge-forming cycle, accumulating incrementally biological facts.


Application of case-based reasoning in human genome computing within the European Union CASTING effort

Case Based Reasoning technology is a unique problem solving technique that offers the ability to develop expert systems more cost effectively and with a much reduced development time scale than existing methods currently in use by European industry, service and research. This technology has proved its pedigree throughout the United States and at many major universities in recent years, but is relatively new to European companies developing expert systems. Expert systems are one of the success stories of Artificial Intelligence research, Case Based Reasoning (CBR) technology moves the frontiers of this research even further forward enabling developers to create accurate decision support systems and automate problem solving processes based on the analysis of previous cases and examples. A system on a protein engineering problem showed the applicability of CBR methods in biocomputing [Napoli, Lieber].Other expamples of CBR methods in biomedical expert systems are our previous work on systems supporting immunological and genetic problems (e.g. [Gierl et al.], [Gierl], [Gierl, Stengel-Rutkowski], [Schmidt et al.], [Swoboda]).

Molecular biology is distinguished from other knowledge domains by a professional documentation of results (e.g. sequences) done during research. Numerous data collections have been accumulated. But the intrinsic biological experience of the data bases is rarely used in knowledge-based systems. Now, a suitable technique - case-based reasoning which is a methodology for reasoning and learning - has reached a state of maturity. The rapidly growing interest of the artificial intelligence community in case-based reasoning provides an increasing set of methods. Case-based reasoning means to use old experiences to understand and solve new problems. In case-based reasoning, a reasoner remembers a previous situation similar to the current one and uses it to solve the new problem [Kolodner].

Case Based Reasoning means to solve new problems by remembering a previous similar situation and by using it to solve the current problem. A case in the context of the work proposed here is a set of essential features which characterizes one or several specific solution(s) of a transcription factor binding site and therefore form the boundaries of this class of treatments. These features can be expressed as categorial, ordinal or number attributes. On the other side CBR depends on the comparison of cases in terms of similarity of their features. In this context features connected with a case are only putative entities which can be used in determining the similarity between two or more cases. "This similarity can be derived from sharing of many different features or properties - not all of which need be necessary for category membership. We then have a picture of the (biological) world being divided conceptually into clusters of similar items, each cluster having a well-defined centre, while the border between one cluster and the next may be relatively poorly defined." [Scutliffe, p. 68] The vage border between prototypes is formed by a set of single cases connected to a special prototye. Prototypes and cases form a hierarchy of well-defined centres and vage borders. Prototypes are constructed when several similar cases reach a defined frequency. The most popular approach to the similarity problem comparing the known cases with the new one is using a measure like the Tversky´s contrast model of features [Tversky] or the Rosch model of category resemblance.

Moreover, the knowledge acquisition can be simplified and improved, because CBR systems incrementally and automatically collect knowledge of a specific biological environment. Therefore, CBR systems use site-specific and time-dependent biological knowledge.

Most of CBR expert systems use specific knowledge representations. It seems likely that the CASUEL syntax, a case-based description language defined in the ESPRIT project INCREA will emerge as a CBR standard [CASUEL]. It provides modelling of taxonomies, inheritance, and adaptation knowledge. It is general enough to design the knowledge representation for a wide varity of classes of molecular biology knowledge. Formalized knowledge on transcription factors (zinc fingers) [Suzuki et al.] or general knowledge on biological entities like DNA [Schroeder, Blattner] could be modelled using CASUEL.

Different questions about DNA-binding sites require different views on the case-based knowledge base. One possibility to cope with this problem is goal-based retrieval of prototypes and cases [Seifert]. The aim here is to explicitly formulate goals for special retrieval contexts.

The importance of CBR is underlined by the CASTING program recently launched by the CEC in the framework of ESPRIT III. The aim is to support technology transfer of the CBR technology into the European industry and to rise the European awareness on CBR in general. The CEC will encourage European industry to utilize the CBR technology in producing CBR tools and apply these tools in as many domains as possible.


Cognitively appropriate researcher/expert system interface

In an early work [Teach, Shortliffe] the requirements of an expert system in biomedicine have been empiracally investigated. Their results show that one of the most important requirements is that such systems have to provide explanations of what they are doing in the process of automatic learning (automatic generation of the knowledge base). These are advantages of symbolic learning systems and especially CBR.

But up to now there is no definition of a standard set of functions required for a cognitively appropriate support in biomedical research providing a rapid, concise and guided interaction. We have suggested the notion of cognitive open expert systems providing cognitive appropriate functions to support the interaction between man and expert system [Gierl].


Communication methods

Two prerequisites are required to integrate the expert systems into the INTERNET. Access to public data bases is usually accomplished by World Wide Web, FTP servers and Email servers. Since public data bases are literally extended each hour it is necessary to implement a transaction-oriented communication system [Gierl et al.] that automatically updates local databases searching in public data bases for interesting new facts.


Service for a public knowledge base

Providing a knowledge base of transcription factors contributing to the world wide communication of the community of human genome researchers aims to establish a service that integrates this knowledge base in the INTERNET and maintains it. Since the technical resources are available and access to the INTERNET (for instance via World Wide Web) is ubiquitous, this is primarily an organisational problem.


References

  • Adams M.D. et al: Initial assessment of human gene diversity expression pattern based upon 83 million nucleotides of DNA sequences, The Genome Directory, Supplement to Nature 28 September 1995, Vol. 377, 3-174
  • Bishop M.J. (Ed.): Guide to Human Genome Computing, London, 1994
  • BMBF: Humangenomforschung - Forschungs- und Förderkonzept, Bonn, 1995
  • Bolis G., Di Pace L., Fabrocini F.: A machine learning approach to computer-aided molecular design, Journal of Computer-Aided Molecular Design, Vol. 5, 1991, 617-628
  • Brugge J.A., Buchanan B.G: Evolution of a knowledge-based system for determining structural components of proteins, Expert Systems, Vol. 6, 1989, 144-155
  • Buchner P.: Weight matrix description of four eukaryotic RNA polymerase II promotor elements derived from 502 unrelased promotor sequences, Journal Mol. Biol., 212, 563-578
  • Casting: http://www.mari.co.uk/~castings/
  • Durbin, R., Thierry-Mieg J.: A.C. elegans Database documentation code and data, anonymous FTP server at limm.lirmm.fr
  • Fickett, J.W.: Recognition of protein coding regions in DNA sequences, Nucleid Acid Research, Vol. 10, 1982, 5303-5318
  • Friedland P., Kedes L.H.: Discovering the Secrets of DNA, Communication of the ACM, Vol. 28, 1985, 1164-1186
  • Gierl L., Arias-Lewing G., Stengel-Rutkowski S., Jakobeit M., Lohse K.: Knowledge Acquisition for Scheme-Based Medical Expert Systems: The Dysmorphic Syndrome Example. In: Rienhoff O., Piccolo U., Schneider B. (Eds.): Expert Systems and Decision Support in Medicine. Springer-Verlag, Berlin, 1988, 347-350
  • Gierl L., Greiller R., Landersdorfer T., Müller H., Überla K.: A User-Oriented Protocol for Integrating Heterogeneous Communication Systems of Medical Facilities Using Ports. Methods of Information in Medicine, 28, 1989, 97-103
  • Gierl L.: Klassifikation mit prototypischen Merkmalsmustern zur Entscheidungsfindung in Expertensystemen für die Medizin, Habilitationsschrift, Universität München, 1992
  • Gierl L.: An architecture for cognitive open medical expert systems using elemental cognitive functions, in: Reichert A. et al. (Eds.): Proceedings of MIE 93, London, 1993, 142-146
  • Gierl L., Stengel-Rutkowski S.: Integrating Consultation and Semi-automatic Knowledge Acquisition in a Prototype-based Architecture: Experiences with Dysmorphic Syndromes, Artificial Intelligence in Medicine, Vol. 6, 1994, 29-49
  • Hayes-Roth B.: PROTEAN: Deriving Protein Structure from Contains, in: Proceedings of AAAI 86, 1986, 904-909
  • King R.D.: An Inductive Learning Approach to the Problem of Predicting a Protein's Secondary Structure from its Amino Acid Sequence, in: Bratko I., Lavrac N. (Eds.): Progress in Machine Learning, Wilmslow, 1987, 230-250
  • Kolodner J.: Case-Based Reasoning, Morgan Kaufmann Publishers, San Mateo, 1993
  • Kondrakhin Y.V., Shamin V.V., Kolchanov N.A.: Construction of generalized consenses matrix for recognition of vestrebrak pre-mRNA 3'-terminal processing sites, Comput. Applic. Bioci., 1993
  • Lathrop R.H., Webster T. A., Smith T. F.: ARIADNE: Pattern-directed Inference and Hierarchical Abstraction in Protein Structure Recognition, Communication of the ACM, Vol. 30, 1987, 909-921
  • Manago M. et al.: CASUEL: A Common Case Representation Language, ESPRIT project 6322, Task 1.1, Deliverable D1, Version 2.01, 1994, INCREA Consortium
  • McAdams H.H., Shapiro L.: Circuit Simulation of Genetic Networks, Science, Vol. 269, 1995, 650-656
  • Müller H., Modrow S., Gierl L., Wolf H.: Rechnerunterstütztes Ebenenmodell zur Analyse und Prädiktion von Makromolekülen. In: Giani G., Repges R. (Hrsg.): Biometrie und Informatik - neue Wege zur Erkenntnisgewinnung in der Medizin, Springer-Verlag, Berlin, 1990, 123-126
  • Napoli A., Lieber J.: A First Study on Case-Based Planning in Organic Synthesis, in: Wess S. et al. (Eds.): Topics in Case-Based Reasoning, Berlin, 1993, 458-469
  • Ritter O., Kocab P., Senger M., Wolf D., Suhai S.: Prototype Implementation of the Integrated Genomic Database, Computers and Biomedical Research, Vol. 27, 1994, 97-105
  • Schroeder J.L., Blattner F. R.: Formal Despriction of a DNA oriented computer language, Nucleic Acid Research, Vol. 10, 1982, 69-84
  • Schmidt R., Boscher L., Heindl B., Schmid G., Pollwein B., Gierl L.: Adaptation and Abstraction in a Case-based Antibiotics Therapy Advisor, AIME 1995, Berlin, 1995
  • Schulze-Kremer S., King R.D.: IPSA-Inductive Protein Structure Analysis, Protein Enineering, Vol. 5, 1992, 377-390
  • Scutliffe J.P.: Concept, class, and Category in the traditon of Aristotle, in: van Mechelen I. et al. (Eds.): Lategories and Concepts, London, 1993, 35-65
  • Seifert C.M.: The Role of Goals in Retrieving Analogical Cases, in: Barnden J.A., Holyoak K.J.: Analogs, Metaphor and Reminding, Norwood, 1994, 95-125
  • Stormo G.D.: Consensus Patterns in DNA, Methods in Enzymology, Vol. 183, 1990, 211-221
  • Suzuki M., Brenner S.E:, Gerstein M., Yagi N.: DNA recognition code of transcription factors, Protein Engineering, Vol. 8, 1995, 319-328
  • Swoboda W., Zwiebel F. M., Spitz R., Gierl L.: A case-based consultation system for postoperative management of liver-transplanted patients, in: Barahona P., Veloso M., Bryant J. (eds.): Medical Informatics Europe 1994, 1994, 530-534
  • Teach R.L., Shortliffe E.H.: An Analysis of Physician Attitudes Regarding Computer-Based Clinical Consultation Systems, Computer and Biomedical Research, Vol. 14, 1981, 542-558
  • Thiesen H.-J.: Multiple gene Encoding Zinc Finger Domains Are Expressed in Human TCells, The New Biologist, Vol. 2, 1990, 363-374
  • Tversky A.: Features of Similarity, in: Psychological Review 84, 1977, 327-352
  • Waterman D.A.: A Guide to Expert Systems, Reading, 1986


Expert system for the analysis of human zinc finger transcription factors

Our research initiative is linked to HGF-Concept by the aim of developing a powerful expert system for the structural and functional analysis of eventually several hundred zinc finger proteins, see HGF-Concept p.11. This expert system is dedicated for handling informations obtained from the analysis of human zinc finger proteins might lead to an integrated intelligent system that might serve as an nucleus for structuring sequencing information and for describing functional networks of gene regulation. We would like to develop intelligent tools for handling the huge information already present for zinc finger genes and their products. In particular, an expert system will be established to determine DNA binding preferences for Krüppel-type zinc finger proteins. Furthermore, TF-EXPERT will include the current knowledge on zinc finger protein functions supplemented by incoming results from project I and II. It will, moreover, serve as a general research tool for scientists working on topics concerning transcriptional gene regulation. In particular, the design and engineering of synthetic zinc fingers might be modelled with the help of TF-EXPERT The local integration of experimental work with intelligent systems of human genome computing might lead to novel concepts and models essential to understand regulatory circuits exemplied by the regulation of gene expression in human organisms.

Recently in the framework of the human genome sequencing effort it has been significantly shown, that analytical and design tasks in modern molecular biology could not be delivered without appropriate hardware and software resources and special expertise in biomedical computing (generation of ETS [Adams et al.]). Despite the putative simplicity of zinc finger protein binding there remains a large number of information processing problems related to the complexity and the amount of biological data e.g. the problem of determining DNA target site specifities of zinc finger proteins harboring more than 3 or 4 zinc fingers. As [Suzuki et al] states "That the communication between DNA and protein can be described with specifity, from chemical, to the stereochemical, to the spacing, to the superspacing levels."
An expert system will be designed in a first step to support research on zinc finger problems up to the previously determined clusters of 37 zinc finger motifs.

Therefore, in supporting Project I and II our aim is (see figure Overview of TF-EXPERT):

  • to exploit public data bases in our research on zinc finger DNA-binding,
  • to combine these facts with our experimental data in a knowledge base using the CASUEL structure and incremental machine learning on zinc finger DNA bindings sites applying the case-based reasoning paradigm,
  • to implement an expert system for research on zinc finger structure-function relationships,
  • to establish a public resource for knowledge on analysis of zinc finger and design of synthetic zinc fingers for gene therapy,
  • to use, as far as possible, public domain or comercially available software on our topic,
  • to investigate our approach in order to port some methods on a parallel system and enhance the visialisation concept of our user-interface to handle all transcription factors of the genome in a publically available knowledge base.
The tremendous amount of data in genome research (DNA databases, protein sequence databases and 3D databases) require effective and fexible data access methods as a basic technology. However, researchers need tools which support the task to compare new motifs, consensus sequences and structures with known biological items in data bases as a search and compare technique in bioinformatics. These methods and the programs respective will be used in our project as far as available. Other families of methods (see above 2. State of the art Project III (4) - (5) ) reflect the need for intelligent searches using expert molecular biological knowledge as well as the presentation of similarities (homologies) to detect new knowledge about the human genome from known facts. A further family of methods comprises machine learning to automatically detect higher-level features which could be used as proposals for further experiments. In a further step TF-EXPERT could be integrated in IGD.
Moreover, we will provide a service (TFkb, Transcription Factors Knowledge Base) for the genome research community for decision support in the detection of transcription factors-related novel knowledge e.g. target genes.

Figure: Overview of TF-EXPERT


Intelligent analysis of human transcription factors

In particular, an expert system on the analysis of human transcription factors (figure: TF-EXPERT - Analysis of zinc fingers) will be established (Transcription Factors Knowledge Base (TFkb)) available for the genome research community. This service will be maintained with resources of the human genome initiatives and will have strong emphasis on the characterization of human transcription factors, in particular of zinc finger gene families. Structures and functions of human zinc finger gene clusters will be determined, such as regulatory sequences, intronic structures, cis-acting elements. In addition, knowlegde of DNA-protein interactions will be implemented in this sysem. Furthermore, by comparing zinc finger genes derived from separate but related clusters evolutionary trees might be derived by sequence comparisons. This knowledge will be automatically and incrementally integrated in TFkb in a process of abstracting knowledge on transcription factors. We will use parts of our expert system ICONS in implementing these functions in Common Lisp.

Figure: TF-EXPERT - Analysis of zinc fingers


Target site prediction as matching the knowledge base

Our contribution aims at the development of an essential bioinformatics technology in the functional analysis of transcription factors and their target genes. In particular, in the case of human zinc finger proteins our expert system on DNA-protein-interactions might be utilized for predicting target sites (figure: TF-EXPERT - Analysis of zinc fingers). Instead of solely using consensus sequences or matrix patterns we will integrate all available facts on target sites in the TFkb knowledge base forming an abstracted prototype/case tree (see below knowledge representation). Predicting in TX-EXPERT means to match a new sequence and binding features related to this sequence with the prototype/case tree. The goal is to find one or more most similar known target sites or abstract prototypes of target sites and present it to the user.Then the new sequence and its binding features are integrated in TFkb. The more sequences are presented to TF-EXPERT the more precise further predictions will be. This is one of the main advantages of our CBR methods.


Design of synthetic zinc fingers

Even more exciting is designing in an interactive way synthetic transcription factors (figure TF-EXPERT - Design of zinc fingers) with predicted DNA binding specificities. In terms of in vivo functions, we might be able to determine a regulatory program for the function of human zinc finger protein. In particular, once DNA fragments have been selected by zinc finger proteins, the expert system might be fruitful in identifying individual contact residues on the nucleic acid level. The required functions of a protein will be matched with the TFkb knowledge base. The most similar zinc finger gene cluster or more abstract zinc finger gene cluster prototype will be presented to the researcher as a first zinc finger proposal. In an interaction with the researcher TF-EXPERT proposes modifications of zinc fingers gene clusters using CBR adaptation methods. Thus the initial zinc finger gene cluster is modified constrained by the background knowledge on transcription factors.

Figure: TF-EXPERT - Design of zinc fingers


Detection of zinc finger binding site preferences

We will etablish a method to automatically detecting zinc finger binding site preferences (target genes) as one important example of general transciptional protein binding sites (figure TF-EXPERT - Analysis of zinc fingers). Therfore, we will adopt methods developed in our previous expert systems, particularly in ICONS.


Knowledge representation in case-based structure

The knowledge representation is a case-based structure of prototypes of classes of zinc finger and single consensus sequences (figure Prototype/case knowledge base) adopted from ICONS. The advantage of this approach is that consensus binding sequences in the IUPAC code could be generalized in an abstraction tree of prototypes and simple consensus sequences. The sequences come from the literature from public data bases and our own work (see above Project I and II). They form a local - but public - base of transcription factors.
A second source of knowledge is a base of background knowledge. While the case-based part of the knowledge base is highly variable this part comprises steady knowledge on proteins and DNA as well as specific knowledge from the literature as for instance on chemical and stereochemical rules on zinc finger DNA binding [Suzuki et al. 1994].
Connected with consensus binding sequences are the regulatory funktions. If there are anyone known these will be extracted from the TIGR and PROSITE data bases. The PROSITE data base (A. Bairoch) contains amino acid consensus motifs including rules like "At least on Pro or Gly from -7 to -2 and from +1 to +7 or at least two or three Asp, Ser or Asu from -7 to +7".
This rules will be parsed and automaticaly integrated in the TFkb. Further rules will be added from our work described in Project I and II.
An important role in detecting binding sites and in designing synthetic zinc fingers play 3-D databases like PDB. Transcription factor specific facts from these databases will be intergrated in the knowledge base.
The prototype/case knowledge base of zinc finger proteins and DNA sites includes knowledge on the following attributes

  • consensus sequences,
  • consensus rules,
  • genes,
  • DNA-binding specifity,
  • protein conformation,
  • regulatory features,
  • affinity,
  • enhancer,silencer,
  • promotor elements,
  • etc.
This knowledge will be stored on various levels of abstraction into the prototype/case knowledge base. In the figure below a prototype is, for instance, a zinc finger gene cluster. All above mentioned attributes which are common to some zinc finger motifs are stored at this zinc finger gene cluster prototype. More general zinc finger gene cluster are stored at higher levels of the knowledge base. More specific zinc finger gene cluster house only those attributes or those values of the attributes which are specific for this prototype. The knowledge base will be indexed for retrieving knowledge deriving both proteins from DNA and DNA from proteins. Thus using the expert system the researcher is enabled to support his work by functions which use the knowledge base in both directions.

The knowledge base on background knowledge will cover

  • chemical rules on zinc finger DNA binding preferences
  • stereochemical rules on zinc finger DNA binding
  • general genomic knowledge
  • etc.

Figure: Prototype/case knowledge base


User oriented tasks of the TF-EXPERT system

The expert system should support researchers working on tasks like

A. Prediction of optimal zinc finger DNA binding site preferences and especially

  • detecting binding sites with more than 3 zinc fingers,
  • intelligent alignment of experimentally selected genomic DNA fragments,
  • assessment of the affinity for zinc finger clusters,
  • prediction of the orientation of a zinc finger protein in respect to its binding site,
  • visual presentation of family resemblance of the current zinc finger cluster under investigation showing
    • specifity
    • affinity
    • evlutionary tree for zinc finger gene cluster on chr. 10p11.2 and 10q11.2
    • etc.
    matching it with the knowledge base.
  • visual presentation of background knowledge (e.g. "rules" for DNA binding site preferences)
B. Design of synthetic zinc fingers and especially

  • supporting the mutagenesis of selected zinc fingers,
  • visual presentation of background knowledge for design (e.g. "rules" for zinc finger clusters).

Evaluation

TF-EXPERT will be evaluated using known zinc finger structure/function relationships. The user interface will be tested for acceptability.

The properties of the TF-EXPERT system should display the following features:

  • A data base collecting all information on Krüppel-type zinc finger genes and proteins should be present.
  • An expert system for predicting putative target sites for zinc finger proteins should be available via INTERNET.
  • The knowledge base should be enhanced and extended automatically as the expert systems is used.
  • A service on accessing knowledge for analysing and designing zinc fingers for the human genome research community.
  • A prospect of large-scale presentation of the whole knowledge on transcription factors related to the human genome.

References

  • Adams M.D. et al: Initial assessment of human gene diversity expression pattern based upon 83 million nucleotides of DNA sequences, The Genome Directory, Supplement to Nature 28 September 1995, Vol. 377, 3-174
  • Klug A., Schwabe J.W.R.: Zinc fingers, The FASEB Journal, Vol. 9, 1995, 597-604
  • Pabo C.O., Sauer R.T.: Ann. Rev. Biochem, Vol. 53, 1984, 293-321
  • Suzuki M., Brenner S.E:, Gerstein M., Yagi N.: DNA recognition code of transcription factors, Protein Engineering, Vol. 8, 1995, 319-328
  • Suzuki M., Gerstein M., Yagi N.: Steroechemical basis of DNA recognition by Zn fingers, Nuccleic Acids Research, Vol. 22, 1994, 3397-3405

 

zuletzt geändert: 30.10.2005