The web site for Grail is http://avalon.epm.ornl.gov

			  User's Guide to

		      GRAIL    and     GENQUEST

		Sequence Analysis, Gene Assembly And
		     Sequence Comparison Systems

			  E-mail Servers

				 &

	 XGRAIL		     GRAILCLNT		   XGENQUEST
     (Version  1.3c)       (Version 1.3)	 (Version 1.1)
			(Command line interface)

		      Client-Server  Systems

			   (May, 1996)


			Informatics  Group
		   Oak Ridge National Laboratory
			Oak Ridge, Tennessee
			      U.S.A.


			   ----------
			   HIGHLIGHTS
                           ----------

GRAIL

	ORGANISMS			Human, Mouse,
					Arabidopsis, Drosophila,
					E. coli

	GRAILs (1, 1a, 2)		Protein Coding Regions

	GAP				Gene Modeling

	PROTEIN TRANSLATIONS

	FUNCTIONAL 			PolyA sites
	SITES				Pol II Promoters
					CpG Islands
					Simple Repeats

	REPETITIVE			Human, Mouse
	DNA ELEMENTS

			
	GRAIL ANNOTATION REPORT


GENQUEST

	Database Searches 		
	and Alignments
					DATABASES

					Swiss-Prot
					PDB
					Prosite
					GSDB
					BLOCKS
					dbEST
					Human & Mouse Repetitive DNA
					Genpept


					METHODS

					Fasta
					Blast
					Prosite
					Smith-Waterman
					BLIMPS
									      
			TABLE OF CONTENTS
			-----------------


	GRAIL OVERVIEW



	GRAIL E-MAIL SERVER USER MANUAL



	GENQUEST OVERVIEW



	GENQUEST (Q) E-MAIL SERVER USER MANUAL



	XGRAIL CLIENT-SERVER SYSTEM USER MANUAL



	XGENQUEST CLIENT-SERVER SYSTEM USER MANUAL



	GRAILCLNT COMMAND LINE INTERFACE TO GRAIL AND GENQUEST



	WORLD WIDE WEB INTERFACE TO GRAIL AND GENQUEST



	ACKNOWLEDGEMENTS



	SOFTWARE SUPPORT



	PUBLIC DOMAIN NOTICE AND DISCLAIMER



	GRAIL PUBLICATIONS



	REFERENCES							      
									      
--------------
GRAIL OVERVIEW
--------------

GRAIL is a suite of tools designed to provide analysis and putative annotation
of DNA sequences both interactively and through the use of automated 
computation. The capabilities of GRAIL are available by several methods. These 
include: (a) an e-mail server at ORNL, which processes DNA sequence(s) contained
in e-mail messages; (b) an interactive graphical X-based client-server system 
called XGRAIL, which supports a wide range of analysis tools, including gene 
modeling; (c) GRAILCLNT, a command line interface to all grail analysis modules
and genquest; and (d) a World Wide Web interface to grail and genquest.

The current e-mail implementation of GRAIL provides analysis of protein coding
potential of a DNA sequence, and an option for protein sequence database 
searches of putative coding regions. 

GRAIL VERSIONS:

The coding recognition portion of the system uses a neural network which 
combines a series of coding prediction algorithms. There are three basic
versions of this neural network, GRAIL 1, GRAIL 1a and GRAIL 2. 

GRAIL 1 has been in place for about five years. It uses a neural network 
described in PNAS 88, 11261-11265, which recognizes coding potential within a 
fixed size (100 base) window. It evaluates coding potential without looking for
additional features (information such as splice junctions, etc).

GRAIL 1a is an updated version of GRAIL 1. It uses a fixed-length window to
locate the potential coding regions and then evaluates a number of discrete
candidates of different lengths around each potential coding region, using
information from the two 60-base regions adjacent to that coding region, to 
find the "best" boundaries for that coding region.

GRAIL 2 uses variable-length windows tailored to each potential exon candidate,
defined as an open reading frame bounded by a pair of start/donor, 
acceptor/donor or acceptor/stop sites. This scheme facilitates the use of more
genomic context information (splice junctions, translation starts, non-coding
scores of 60-base regions on either side of a putative exon) in the exon 
recognition process. GRAIL 2 is therefore not appropriate for sequences without
genomic context (when the regions adjacent to an exon are not present).

These changes have improved the overall performance compared to GRAIL 1,
particularly for short exons. 

All three systems have been trained to recognize coding regions in human DNA
sequences, although they also work well on a number of other organisms, 
particularly other mammals. 

Grail 1 version has been implemented for human, mouse and e.coli

Grail 1a version has been implemented for human and mouse.

GRAIL 2 versions for several organisms have been implemented: namely, human,
mouse, arabidopsis and drosophila.

[For convenience we use the term "exon" to refer to coding regions, and a note 
of caution is that non-coding exons, or non-coding portions of exons will not 
be recognized by the system.]
									      
GRAIL PERFORMANCE STATISTICS

GRAIL 1 typically finds about 90% of coding regions greater than 100 bases with 
performance falling off for shorter exons. GRAIL 1 has been tested on a set of
human genes containing 102kb of sequence. This set contained 70 coding exons and
the system identified 62 (89%) and assigned them all to the correct strand. Of
the eight missed 6 were less than 100 bases long. In a larger test set strand 
assignment was 90-95% correct. The preferred reading frame assignment was
correct for 60 (95%) of these exons while the frame assignment for the other 
two had some ambiguity.

Of the predicted exons with a quality  score of "excellent" all were actual
coding exons. Of predicted exons scoring "good" 69% were real and of the
predicted exons with a score of "marginal" only 16% were real. Though this is a 
rather limited test set, the results of this analysis give some guidance for 
interpreting GRAIL 1 output.

GRAIL 1a performs much better than GRAIL 1 in finding true exons and eliminating
false positives. It is also better than GRAIL 1 in terms of finding the 
boundaries (edges) of coding regions. GRAIL 1a has been tested on a set 137
sequences containing 954 exons. The system recognized 82% (787) of the exons in 
the set, with a false positive rate of 11%. 

Of the 954 exons in the set, 711 exons were greater than 100 bases long. The 
system recognized 95 % (675) of these exons. The frame assignment was correct
virtually always (greater than 98% of the time).

GRAIL 2 finds about 91% of all coding regions, with a performance that is close
to being independent of exon size. Its false positive level is similar or even
slightly better than GRAIL 1. GRAIL 2 has been tested on a set of 137 sequences 
containing 954 exons. The system recognized 91% (857) of the exons in the set, 
with an apparent false positive rate of 8.6% (most of these were outside the 
domain of the known genes and some may actually be real). 

Of exons less than 100 bases long GRAIL 2 found 102 out of 122 or 84%. GRAIL 2
provides the best candidate for a given coding region in a manner which includes
splice junctions (or translation start/stop) at the candidates edges, so the 
user will note that the edge of the candidates in the initial and summary tables
correspond to putative edge signals. In the test set, about 61% of the 
recognized exons had both edges exactly correct (the right splice junctions
picked) and about 96% had at least one edge correct. GRAIL 2 is perhaps better
at estimating the true extent of an exon compared to GRAIL 1 and this additional
accuracy may help in experimental protocols such as those involving PCR.      
									      
-------------------------------
GRAIL E-MAIL SERVER USER MANUAL
-------------------------------

The GRAIL e-mail server finds potential protein coding regions in anonymous DNA
sequences and provides a means of searching the translations of these regions
against protein and motif databases.

To have sequences analyzed by e-mail, send e-mail to: GRAIL@ornl.gov

Please note that:

(i)   GRAIL is case-insensitive,

(ii)  More than one sequence can be sent in an e-mail message,

(iii) The length of a sequence must be at least 100 bases (for GRAIL 1) and at
      most 100 kilo-bases, and

(iv)  The sequence must consist of letters A, C, G, T or U. U is converted to T.
      Any other character is converted to C. Blanks are ignored. Digits are
      also ignored.


The first line of the message MUST be in the following format: 

Sequences NUM_SEQ [-1.0 / -1a / -1.2 / -1.3] [-S / -E / -P / -p / -B / -b]

The word Sequences, followed by the number of sequences in the message, followed
by OPTIONAL switches:

   (a) one of -1.0, -1a, -1.2, or -1.3 and 

   (b) one of -S, -E, -P or -p or -B or -b.


The first line is followed by the sequences in the following format:

>sequence_name
sequence

A typical message is shown below:

Sequences  3  -2  -E

>seq1name
AAAATTTCGGG........


>seq2name
GGCTGTTCATG........


>seq3name
ATTGCAGACAG
									      
OPTIONAL SWITCHES
-----------------

One of the following two:

-1.0 switch specifies that GRAIL 1.0 will be used for the analysis.

-1a  switch specifies that GRAIL 1a  will be used for the analysis.

-1.2 switch specifies that GRAIL 2 (version 1.2) will be used for the analysis. 

-1.3 switch specifies that GRAIL 2 (version 1.3) will be used for the analysis. 

   The default is GRAIL 2 (version 1.3).

and one of the following six:

-S switch specifies that translations of all potential coding regions be
   searched against SwissProt using an implementation of the Smith-Waterman 
   algorithm on an Intel iPSC/860 parallel computer.

-E switch is same as -S, except that only "excellent" potential coding regions
   be considered for the searches.

-P switch is same as -E, except that instead of Swiss-Prot, Prosite Database is
   searched.

-p switch is same as -P, except that abbreviated Prosite Database search output
   is returned.

-B switch is same as -E, except that Blast method is used instead of Smith-
   Waterman. Top 40 database hits are returned.

-b switch is the same as -B, except that top 10 database hits are returned.

The database search hits provide an indication of homology between recognized
exons and existing proteins.

RETURN MESSAGE
--------------

For each sequence the following information will be returned:

1.  Initial Coding Scores: 

GRAIL 1 reports the score for the coding potential for each position analyzed on
each strand (the f-(forward) strand represents the sequence as received, and the
r-(reverse) strand represents the reverse compliment).

These scores range from  0.0 to 1.0 and a score greater than 0.5 identifies a
region with protein encoding potential.  Non-coding regions often have a score
of 0.000. To reduce the output, only regions with scores of at least 0.01 are 
reported.

GRAIL 1a, 1.2 and 1.3 use a somewhat more concise format appropriate for their
design and implementation. Instead of a position by position score, they report 
a table for the forward strand and a table for the reverse strand, which lists
potential exon candidates and their scores. 

Sometimes a single exon is perceived in both the forward and reverse direction,
and the issue of which is the coding strand is resolved in a later step
(described below).							      

2.  Frame:

In calculating the coding potential, the system calculates the reading-frame
which is "preferred" in the window over which the calculation is done
(100 bases for GRAIL 1 and the exon candidate length for GRAIL 1a and GRAIL 1.2
and 1.3). 

In GRAIL 1 this information is  returned for positions with scores over 0.5, 
while in GRAIL 1a, 1.2 and 1.3 each candidate exon has an associated frame.  

In GRAIL 1 the translation frame predicted is true for about 95% of true exons,
while in GRAIL 1a, 1.2 and 1.3, it is true virtually always (greater than 98% of
the time).

3.  ORF:  

The limits between which the preferred frame is open is returned for windows
with scores over 0.5 (GRAIL 1) or exon candidates (GRAIL 1a, 1.2 and 1.3).

4. EXON Summary Table: 

The second part of the output is the system's interpretation of the raw data
(neural net outputs). This summary table provides the estimated limits of the
coding exon, the most likely strand for the exon with a probability for the 
correctness of the strand assignment, the preferred reading frame for the exon 
and a quality assessment.

An interesting phenomenon we have noted is that some exons seem to have coding
character on both strands, so be aware that strand assignments are not always 
correct, and it is sometimes useful to consider both strands as possible. 
Strand assignment is correct about 95% of the time in GRAIL 1 and greater than
98% of the time in GRAIL 1a, 1.2 and 1.3.

Any exon with a quality score of "excellent" is worth further consideration.
									      
-----------------
GENQUEST OVERVIEW
-----------------

GENQUEST is an integrated sequence comparison server which allows users to make
use of a wide variety of sequence comparison methods and target databases,
through either e-mail or an X-based client server system, XGENQUEST. 
GENQUEST can also be transparently accessed from XGRAIL.
The purpose of the system is to allow rapid and sensitive comparison of DNA and
protein sequences to existing DNA and protein sequence databases.

The databases which can be accessed from the GENQUEST server include:

GSDB (Genome Sequence Database): DNA sequence database satellite maintained at
ORNL and updated daily from the primary database at Los Alamos National 
Laboratory),

SWISSPROT[1],

PROSITE[2] (a library of protein motifs),

PDB[4] (Protein Databank sequences of proteins with solved structures),

BLOCKS[9] (Protein motif database based on conserved blocks),

dbEST (Expressed Sequence Tag database),

a library of repetitive DNA sequences (including human, invertebrate, plant,
  and rodent) (from J. Jurka[3]), and

GENPEPT (a database of protein translations of coding regions from sequences in
Genbank).

GENQUEST uses a specialized parallel computing environment at Oak Ridge National
Laboratory and is supported and curated by a number of groups in the Genome
community.  

As new analysis tools become available, the modular nature of the GENQUEST
server will facilitate their implementation and broaden their accessibility to
the research community.

The GENQUEST server not only allows the user to access multiple databases but
also allows several databases to be queried from the same message. 

The GENQUEST server also supports a number of methods for database searching.
									      
----------------------------------
GENQUEST E-MAIL SERVER USER MANUAL
----------------------------------

GenQuest can be accessed by sending e-mail to: Q@ornl.gov

Messages to GENQUEST begin with a set of keywords which specify the options to
be used in the search. Two key words are mandatory: TYPE and SEQ. The remainder
are optional or have default settings. GENQUEST is case insensitive.

EXAMPLE of a typical query:

TYPE DNA6
TARGET SwissProt
METHOD SW -g 13
MATRIX PAM120
SCORE 50
ALIGN 20
SEQ
ATCTATCGTCGAGCTGGTGTCTGTGCTAGTCCACAGACAGHCTCGCTATATATGCT
CGTTTTAAAGCTCGTATATATGCTCTCGCTAGTCCGATCGATGCTCGATCGCTAGTA
TCGTATGATTCTTG
END

This example translates the given DNA sequence in 6 frames and searches
SwissProt, using Smith-Waterman with gap penalty of 13, PAM120 matrix, and
showing top 50 matches and top 20 alignments.

KEYWORDS: The keywords and options supported by the server are listed below:

1] TYPE ( DNA / DNA6 / PROTEIN ): the type of sequence being submitted. 

PROTEIN specifies that the input is an amino acid sequence. 

DNA6 specifies that the input sequence is DNA and to be translated in all 6
reading frames for search against protein databases.

DNA specifies a DNA input type which can be searched against DNA target 
databases or if a protein database is selected as target, translated only in the
frame of the first base in the sequence and searched against protein databases.

The DNA6 options requires quite a long search time and is not recommended for
DNA sequence of more than 1000 to 2000 bases.

2] TARGET ( GSDB | REPETITIVE | dbEST  | SWISSPROT | 
	    PDB  | PROSITE    | BLOCKS | GENPEPT ): 
specifies the database to which the sequence will be compared. 

Multiple targets are allowed to specify comparison against more than one 
database.

a) SWISSPROT: Swissprot protein sequence database (updated quarterly).  

b) GSDB: Genome Sequence Database, a daily updated DNA sequence database.

c) PDB[4]: Structure database, (Brookhaven) Protein Databank. Hits represent 
homologous proteins of known structure.
									      
d) PROSITE[2]: Protein motif library which can provide clues as to protein
function or classification.  

e) REPETITIVE: Comparison of DNA against a library of repetitive DNA 
(including human, invertebrate, plant, and rodent) from J. Jurka and helps
provides annotation of repetitive DNA elements. The default repetitive 
library searched is human repetitive DNA. The key word REPETITIVE can be 
replaced with REPEAT_INV, or REPEAT_PLN, or REPEAT_ROD, to search invertebrate,
plant, or rodent repetitive DNA. 

f) BLOCKS[9]: Protein motif database based on conserved blocks. 
BLIMPS (Blocks IMproved Searcher) search tool is used for BLOCKS database 
searches.

g) DBEST: Expressed Sequence Tag database.

h) GENPEPT:

NOTE: The version of the database searched is listed in the results from 
GENQUEST.

3] METHOD ( SW / FASTA / BLAST / FLASH ): specifies the comparison algorithm to 
be used in the search.  The options are Smith-Waterman (SW) [5], FASTA [6], 
BLAST [7], and FLASH [11]. 

The default method is SW.

Exceptions: For BLOCKS and PROSITE databases, no method needs to be specified, 
since special methods are used for searching those database.

The defaults for FASTA and BLAST are the standard defaults used by these 
programs. Blast and FASTA options can also be set on this line. Descriptions for
these are available by sending "help fasta" or "help blast" e-mail to the
grail@ornl.gov address.

The gap penalty used in the SW program is set on this line using -g.  
For example, SW - g 10 sets the gap penalty to 10. The default is 13.

4] MATRIX ( PAM [n] / Blosum [m] ): specifies the matrix used for protein 
sequence comparison. 

[n] specifies any valid PAM matrix, viz. a multiple of ten, within the range
10 to 250. For example, PAM 250 [8].

[m] can be 62 or 80 [9]. The default is Blosum 62.  

These are not used for DNA-DNA comparison.

5] FILTER: specifies that repetitive DNA elements recognized in the query 
sequence should be masked so as not to lead to unwanted matches against the DNA 
sequence database.  This filtering system use a library of repetitive DNAs
(including human, invertebrate, plant, and rodent) from J. Jurka.
The default is no filtering.  

A DNA query which is then translated and searched against the protein databases
avoids spurious hits that can arise from the translation of repetitive elements.
The utility of such a filter is well documented [10].

6] SCORE num_score: specifies the number of hits to be reported. Default is 10.

7] ALIGN num_align [-g]: specifies the number of hits for which alignment should
be performed. 
Default is 10. Usually for proteins 10 to 200 is an appropriate range.	

The SCORE value should be greater than or equal to the ALIGN value.
The program normally does a local alignment, however, a global alignment maybe
requested using -g on this line. 

For example, ALIGN 10 -g returns global alignments of the top 10 hits.

The keywords SCORE and ALIGN apply only to method SW (parallel implementation
of Smith-Waterman).

8] COMMENT comment: specifies one line of text to be prepended to the return 
message from GENQUEST. 

9] SEQ 
   sequence
   ......
   END 

SEQ and END are keywords which specify where the sequence starts and ends in
the e-mail message. 

The sequence must begin on the line following the keyword SEQ (not on the same
line as SEQ).

The sequence can be either standard single letter protein or DNA sequence. 
The length of the sequence lines should be less than 512 characters. 

In DNA sequences, any characters other than A, C, G, T and U are converted to X
(and therefore will be filtered out). U is converted to T.

Blanks are ignored in DNA and Protein sequences.

ADDITIONAL EXAMPLES
-------------------
The examples below illustrate typical queries for various types of searches:

i) This example searches the given protein sequence against SwissProt, using
FASTA with default parameters and default BLOSUM 62 matrix; and, also searches
Prosite (using a special method). 

TYPE Protein
TARGET SwissProt
TARGET Prosite
METHOD FASTA
COMMENT this is my protein sequence comparison run
SEQ
LYSEGRTAAGLVPPRTYILGREFWAAGLUTRYTHISPLEASE
END

ii) This example searches the given DNA sequence against GSDB  and the 
Repetitive DNA library (using the SW default).

TYPE DNA
TARGET GSDB
TARGET REPETITIVE
SEQ
ATAGATAAAGGGTGCTGTTTGGCGAAATATTGCTGCTGGCGCCGTAGATATATAG
CTGTGCTGTGATGTCGCTCGTAGATATAGCTAGTCTAGTCGATCG
END									      
									      
---------------------------------------
XGRAIL CLIENT-SERVER SYSTEM USER MANUAL
---------------------------------------

XGRAIL is a client-server implementation of a group of analysis tools for 
sequence exploration and gene discovery.  It allows the user to find protein
coding regions in anonymous DNA sequences, to assemble gene models, translate
part or all of these models, and search these translations against various
databases. 

The latest version of the system incorporates access to analysis for several
organisms: human, mouse, arabidopsis, drosophila and e. coli.

Database searches of a region of a DNA sequence against various databases are
also supported. XGRAIL also provides information about GC content, and the
location of several types of functional sites (splice junctions, polyA sites,
Pol II promoters, CpG Islands and simple repeats) and a variety of human and
mouse repetitive DNA sequences.

All the information generated during the analysis of a DNA sequence can be saved
for future retrieval and further processing. 

Additionally, an annotation tool is provided within XGRAIL, which facilitates 
marking (annotating) items of significance to the user, and generating an 
annotation report which can then be saved to a file or printed.

The current version of XGRAIL (version 1.3c) has been implemented using Motif
1.2 toolkit. It has been compiled and tested on Sun, Dec Alpha and Silicon 
Graphics workstations, running the Unix operating system.

Connection of the user's machine to the Internet is required.		      
									      
OBTAINING AND INSTALLING XGRAIL (Version 1.3c) CLIENT SOFTWARE
--------------------------------------------------------------

WARNING FOR SUNOS 4.1.3 (OR EARLIER) USERS: 

Running on Sun Platforms: Sun's xnews. the X server underlying Sun's OpenWindows
prior to Solaris 2.3, requires some patches (available from Sun) in order to run
a Motif client, including xgrail_1.3c.

If the required patches have not been installed, the X server WILL CRASH !!!

Please check with your system administrator before running (or displaying 
remotely) xgrail_1.3c on a Sun platform running any OS prior to Solaris 2.3.

NOTE: In the instructions below, replace  by sun, sgi or dec,
depending on the platform you are installing the software on.


1. Create a subdirectory in which you wish to install XGRAIL (Version 1.3c).

     % mkdir XGRAIL_1.3C

   Go to that subdirectory

     % cd XGRAIL_1.3C

2. Obtain the XGRAIL (version 1.3c)  distribution by anonymous ftp, as follows:

     % ftp arthur.epm.ornl.gov (or ftp 128.219.9.76)

     Name: anonymous

     Password: [your internet address]

     ftp> cd pub/xgrail//ver1.3c

     ftp> binary

     ftp> get README

     ftp> get xgrail..ver1.3c.tar.Z

     ftp> quit

3. Extract the files from xgrail..ver1.3c.tar.Z

     % zcat xgrail..ver1.3c.tar.Z | tar xvf -

4. At this point, there should be following files in XGRAIL subdirectory :

     Manual.grail1.3-genquest.May96	(Grail-Genquest User Manual)

     README

     orgseqs                      (Subdirectory containing test sequences)

     xgrail_1.3c

     XGrail			   (XGRAIL_1.3C resource file)

     xgrail.config		   (XGRAIL_1.3C configuration file)

     grail.publ			   (List of Grail-related publications)

     xgrail..ver1.3c.tar.Z       (Can be deleted at this point)

5. Do one of the following:

   A] Copy XGrail, xgrail.config and grail.publ to your home directory

     % cp XGrail ~

     % cp xgrail.config ~

     % cp grail.publ ~

   - OR -

   B] Set the environment variable, XGRAILHOME, to the full path name of
      the directory, XGRAIL_1.3C

6. Include XGRAIL_1.3C (or XGRAILHOME, as the case may be) in your PATH
   environment variable.

7. You can start up the xgrail program:

   From the command line        % xgrail_1.3c &

   OR 

   From the file manager by double-clicking on the xgrail_1.3c icon.


NOTE for site-wide installation:
--------------------------------

If there are multiple users of xgrail_1.3c at your site, you should install 
XGRAIL_1.3C in a publically accessible area (typically /usr/local/XGRAIL_1.3C).
Each user must copy the resource file, XGrail, the configuration file, 
xgrail.config and grail publications file, grail.publ, to one's home directory,
before using xgrail_1.3c for the first time. 

Alternatively, each user must copy the resource file, XGrail, to one's home
directory and set the environment variable, XGRAILHOME, to the directory
containing the XGRAIL_1.3C installation (typically /ur/local/XGRAIL_1.3C).
									      
DESCRIPTION OF XGRAIL (Version 1.3c):
------------------------------------

This section has been organized in the form of a step-by-step tutorial. The best
way to understand the operation and capabilities of XGRAIL is to read the 
following description while running XGRAIL with one of the sample DNA sequences 
provided with the software.

MAIN WINDOWS:

There are four main windows in XGRAIL:  the (top) XGRAIL window, the (middle)
DNA Sequence window and the (lower) Grail Exon and Grail Gene Model windows.

When XGRAIL is started on the client machine, it first contacts the GRAIL server
to check for any informational messages. If there are any, they are retrieved
and displayed in a notice window.

On clearing this window, the empty XGRAIL window is displayed. Across the top of
the window is a menu bar with a number of menus. The left side of the XGRAIL
window has several additional menus and controls.

A menu can be invoked (pops up) by clicking on it with the left mouse button.
A menu option can be selected by clicking the left mouse button on the option.

Initially only the File Menu is enabled, since the only actions allowed at first
are to: (a) configure the grail server host name and port number, and (b) load a
sequence.

MENUS IN XGRAIL WINDOW:

   FILE MENU (CONFIG, LOAD, SAVE and EXIT):

   CONFIG: Select Config option from the FILE menu. A "Configuration Console"
window pops up, with the Grail server host name and port number (read from
xgrail.config file).

The defaults are galahad.epm.ornl.gov (or 128.219.9.184) and 2050, respectively.
In the near future, grail servers will be running at several sites around the
world. The user can, at that point, modify the configuration to access the
closest server.

   EXIT:  Ends client-server interaction after allowing the user the 
option of saving new analysis and changes made during the session. Basically the
current state of analysis, including database searches, can be saved in a 
.xgr.1.3c file.								      

   SAVE: Saves the current sequence analysis to the corresponding analysis file.

   LOAD: The first step in using XGRAIL is to load a DNA sequence file.
Selecting the menu option "Load" pops up a sequence directory window which
displays subdirectories, sequence files (.seq) and XGRAIL (Version 1.3c)
analysis files (.xgr.1.3c). Just under the directory and file lists are menu
options to select the organism and frameshift error correction. If the user
selects the option to perform frame shift error correction, the user can also
specify whether the system should look for frameshift errors on forward strand, 
reverse strand or determine by itself which strand the errors (if any) are on.

A file or subdirectory can be selected by double clicking (left mouse) on the 
name. Alternatively, clicking (left mouse) on the file name and then clicking
on the OK button at the bottom of the directory window loads the file.  

If (a) an analysis file (.xgr.1.3c) is selected or (b) a sequence (.seq) file is
selected and an analysis file (.xgr.1.3c) exists for it, then the information
from the analysis file is read in and displayed.

If a .seq file is selected and no analysis file (.xgr.1.3c) exists for it, then
the sequence is read from the file, sent to the GRAIL server for calculation of
coding probability and exon prediction.

The program ignores digits in a sequence file, and can therefore read sequence
files with sequence base numbers.

[For purposes of this discussion and on the XGRAIL Display, the term exon is 
used interchangeably with coding region. Non-coding exons or portions of 
non-coding exons are not currently recognized by the system.]		      

Depending on the size of the sequence and the load on the GRAIL server, it may 
take a few seconds to a few minutes for the results to come back from GRAIL 
server.

At this point, the other menus and controls are enabled and the GRAIL analysis
displayed in several windows:

XGRAIL WINDOW displays the GRAIL analysis of the query sequence, identifying
potential coding exons on the forward and reverse strands which are color coded
for quality with green = "excellent" (about 90% probable), blue = "good" (about
60% probable) and red = "marginal" (about 20% probable). Gene models are also 
represented in this window by a set of linked cyan bars. Several other features 
which will be described below are also displayed in this window. This window is 
initially 10kb wide and longer sequences can be fit into the window by using the
zoom feature. Dragging the zoom indicator with left mouse changes the zoom.

DNA SEQUENCE WINDOW displays 100 bases of DNA sequence from both strands. The
position of this sequence is indicated by the double vertical green lines in the
central regions of the XGRAIL window.  The position of this blow-up region can
be moved by clicking at the desired location on the central horizontal band
(the gray-scale band showing GC content) of the XGRAIL window or by clicking the
arrows on either side of the DNA Sequence window (left mouse). This window also 
displays exons from the Exon Table as color coded horizontal bars and exons from 
gene models similarly in cyan. Translations of exons are also shown in the 
central region of this window (described later). Other features (PolyA sites, 
Promoter regions, CpG Islands, Repetitive DNA elements and Simple repeats) are
displayed as color-coded sequence characters.

GRAIL EXONS WINDOW: This window displays information about exons found in the 
sequence by GRAIL: Strand (Forward or Reverse), reading frame, position of the
exon on the sequence, limits between which the preferred reading frame is open,
quality score, and the number of database searches done. 

GRAIL GENE MODELS WINDOW: This window displays information about gene models
found in the sequence, in two subwindows: 

Model Exon Table subwindow displays information for each of the exons in the
currently selected gene model, assembled by GRAIL: reading frame, position of
the exon on the sequence, quality scores of translation start, acceptor and
donor splice junctions used in building the gene model, and the number of
database searches done.

A * in front of the first model exon score indicates that this score is for 
translation start, not acceptor junction, and an absence of * means that the 
assembly program did not find a suitable start site. A * and blank score after 
the last exon indicates a suitable stop codon has been found, while a numerical
score and absence of a * indicates that this is a donor junction, and no stop 
codon has been found. 

Gene Model Table subwindow displays information for each of the gene models
assembled by GRAIL: date of assembly, strand (Forward or Reverse), region of
the sequence considered in assembling the model, score, number of exons in
each model and the number of database searches done.

Any time during the session, the user has the option to save the current state
of analysis to the analysis file, by selecting Save option from File Menu.
(PLEASE NOTE that the previous analysis file for that sequence is overwritten).

GRAIL MENU:  This menu lets the user select the Grail version (1, 1a or 2) 
about which GRAIL analysis information should be displayed.
(GRAIL 2 is the default).

The difference between the three versions is as follows: 

GRAIL 1 recognizes coding potential without using other signals and is perhaps 
best suited for those cases when small fragments are to be evaluated or when 
genomic context is considered to be inappropriate (as in cDNA sequences). 

GRAIL 1a is an updated version of GRAIL 1. It first uses a fixed-length window 
to locate the potential coding regions and then evaluates a number of discrete
candidates of different lengths around each potential coding region, using
information from the two 60-base regions adjacent to that coding region, to find
the "best" boundaries for each such region. 

GRAIL 1a, like GRAIL 1, is more useful for non-genomic sequences (like cDNA 
sequences).

GRAIL 2 identifies exons by using signals such as splice junctions and other 
genomic context.  It is therefore best suited for analysis of genomic sequences.

Please note that: 
(a) Models of genes can be constructed only from GRAIL 2 exons.
(b) Database searches and protein translations can be done from any version.

   Fit-Win BUTTON: Click on this button to fit the display of the entire
sequence within the XGRAIL window.

   Zoom:  The Zoom slider allows for rescaling of the loaded sequence in the 
XGRAIL window. The default zoom value is 1 and corresponds to 10 kb per screen 
width.  The zoom can be changed by dragging the zoom slider.

   WINDOWS MENU: Clicking the left mouse button on Windows menu displays the
list of several additional windows: DNA Sequence, Grail Exons, Grail Gene
Models, Features, Annotations, Range Markers, Sketch and Grail Publ windows.

Clicking the left mouse button on one of the options results in the display of
the corresponding popup window.

FEATURES WINDOW: This window displays the list of features (of the currently
selected feature type) found in the sequence by GRAIL: PolyA sites, Promoters, 
CpG Islands, Repetitive DNA elements, Simple Repeats and Frame Shift errors.
The feature type to be displayed can be selected from a selection menu found at
the top of the window. A specific feature item can be highlighted by clicking
on its entry in the list. The item is highlighted in the XGRAIL, DNA sequence
and Features Windows. A feature can also be highlighted by clicking on its
graphical representation (in XGRAIL or DNA SEQUENCE window) or on its entry in
FEATURES window.

All functional features supported by GRAIL are described later in the manual.

ANNOTATIONS WINDOW: This window displays items selected by the user for 
inclusion in an annotation report. An item can be selected for annotation by
double-clicking with the left mouse button on its entry in the relevant Table: 
Exon Table, Gene Model Exon Table or Gene Model Table; Feature Table(PolyA,
Promoter, CpG Island, Repetitive DNA or Simple Repeats) in Features Window; and,
Database Search Table in Database Search Info Window. 

The Annotation Tool is described in detail later in the manual.

RANGE MARKERS WINDOW: This window displays positions of the markers which
set the limits for various operations, viz. constructing a single gene model,
performing a database search for a region of the DNA sequence. The markers are
the blue arrows at the ends of the central region of the XGRAIL window which can
be pulled to any position along the sequence using the sliders on this window.
Alternatively the arrows can themselves be dragged on the main XGRAIL window. 

SKETCH WINDOW:  This window is overlaid on XGRAIL window and displays the 
coding probability over the entire sequence and provides a reference for the 
user's location in the whole sequence. The red horizontal marker in the Sketch
window corresponds to the portion of the sequence displayed in the larger 
XGRAIL window.

GRAIL PUBL WINDOW: This window displays some GRAIL-related publications.

   FEATURES MENU: This menu toggles on and off the display of any of the feature
types in the XGRAIL and DNA Sequence windows, viz. PolyA sites, Promoters, CpG 
Islands, Repetitive DNA elements, Simple Repeats, Frame Shift errors, GC Content
or Shadow Exons.

The algorithm to determine CpG Islands is based on the definition of CpG Islands
by Gardiner-Garden and Frommer (J. Mol. Bio 196:261-282, 1987). 

Shadow Exons: In cases where predictions on both strands overlap, GRAIL 1 and 2
use a strand-determination algorithm to determine the more likely strand. The 
rejected exon's coding probability is, nonetheless, displayed (by default). It
can be toggled on or off, as desired. Toggling shadow exons off presents a more
easily interpretable display.

Clicking on an individual feature item in XGRAIL Window highlights it and the 
corresponding entry in the Features Window. 

   ASSEMBLE MENU: is used to construct gene models within specified regions of 
the sequence. The region for assembly is defined using the Gene Assembly Markers
window (described earlier). 

There are three options for Assembly:  Auto Select which allows the program to
pick the "best" model, Forward Strand which assembles exons on the forward
strand and Reverse Strand, which assembles exons on the reverse strand.

This version of the gene assembly program, GAP III, uses dynamic programming 
and heuristics, and takes only a few seconds to run.

The results of model construction can be viewed in the XGRAIL window as a 
series of linked cyan bars and in the DNA Sequence window as cyan bars. The 
details of the model are listed in the Model Exon Table and Gene Model Table.

Selection of Exons and Models: For a number of operations including translation
of individual exons or models, and database searches for individual exons or 
models, a particular exon or model must first be selected. 

Exon selection is done by clicking on the desired exon bar in the XGRAIL or DNA
Sequence window or by clicking on the corresponding row in the Exon Table, or 
Model Exon Table.  

A particular gene model can be selected by clicking on the corresponding row of
the Gene Model table.

   TRANSLATION MENU: This displays the translation of exons in the exon table, 
gene model exons, or entire gene models based on a choice in the Translation
submenu.

For exons in the exon table, a translation is provided in only the statistically
preferred reading frame (one frame for a given exon). This frame is listed in 
the Exon Table window. 

In GRAIL 2 and GRAIL 1a, the choice of this translation frame is correct greater
than 98% of the time, while in GRAIL 1 it is about 95% correct.

For gene models, the frame appropriate to the exon and model is used (frames 
listed in Model Exon table). Since the gene model is constructed in a manner 
which is reading frame consistent with the initial statistical estimates of 
frame, the frame used here is virtually always the same as in the original exon
table.									      

The resulting translation appears in a Translation pop-up window.

The extent of the exons and their translations can also be viewed in the DNA
Sequence window, in the central horizontal area between the two DNA sequence 
strands. Yellow single letter protein translation is displayed when an exon in 
the exon table is selected. 

Selecting a gene model exon results in display of the translation in cyan, 
overlying the yellow translation from the same exon in the exon table. 

If there is a frame discrepancy at a given location both translations will 
appear simultaneously.

   SEARCH DATABASE MENU: allows the user to access the GENQUEST (Q) sequence
comparison server. A GENQUEST Search Options window comes up and displays all
the available options.

Here, a multitude of options are possible including search of exons, gene model
exons, and gene models, as well as other selected parts of the DNA sequence 
against SwissProt, Prosite, PDB (protein structure database), the Genome 
Sequence Database (GSDB), BLOCKS, dbEST, genpept and the repetitive DNA library,
using a number of algorithms.

Other details for these options are described in the GENQUEST manual.  The 
results of GENQUEST searches are displayed in a pop-up window.

   DBSEARCHINFO MENU: Tracks database searches and allows one to find and 
display previous database search results. The submenu allows selection of the 
GRAIL Exon, Model Exon, Gene Model or DNA Sequence search list. The selected
list is displayed in the Search Info pop-up window from which the results of a
given search may be chosen for display. These results appear in a pop-up window
which lists matches and the target database used for the search. Search results
can be deleted from the Search Info window. 

DESCRIPTION OF FEATURES: GRAIL can find the following functional sites in a DNA
sequence:

   POLYA Site: The vertical cyan bars above and below the GC band of the XGRAIL 
window mark the positions of potential poly-A addition signals.

   PROMOTERS: Pol II Promoter regions are displayed as hollow, yellow rectangles
with a red vertical bar (representing 'TATA' location) above or below the GC 
band, in the XGRAIL window. The current version of promoter recognition 
software is trained to recognize only Pol II promoters having TATA-like 
elements. The Pol II Promoter recognition system [7] is a prototype. The current
system is trained to recognize only Pol II promoter regions with TATA-like
elements containing the subsequence ATA. The system detects about 60% 
of Pol II promoter regions with TATA-like elements wth a false positive rate of
approximately 1 per 35000 bases of DNA sequence. The statistics have been 
calculated based on annotated GSDB sequences.

   CpG ISLANDS: CpG Islands are displayed as hollow, purple rectangles with 
vertical tabs superimposed over the GC band, in the XGRAIL window.

   REPTTV DNA: An option for locating various repetitive DNA elements is 
provided and these elements are indicated by centrally located yellow hollow 
rectangles with vertical tabs and cyan arrow-heads indicating their orientation.
Analysis of repetitive DNAs requires detailed sequence comparison using Smith-
Waterman and may take some time especially for very long sequences or those 
with many repetitive elements (about 6 minutes for a 21kb sequence with 23 
hits).

Once the repetitive analysis is done this feature can be toggled on and off the
display like any other feature type using the Features menu. The human 
repetitive DNA annotations come from a library of 65 elements provided by 
J. Jurka. 

   SIMPLE REPEATS: This option finds all tandem repeats (microsatellites) and
other types of simple perfect and imperfect repeats in the sequence.

   FRAME SHIFT ERRORS: This option finds all likely frame-shift errors in exons
in the sequence with some false positive rate. Consider the results as being
suggestive.

   GC CONTENT: is represented by gray shading in a central horizontal band in
the XGRAIL window. This band reflects the GC content of a sliding 50 base region
with white being high GC and black low GC.

   SHADOW EXONS: In cases where exons overlap on both strands, GRAIL 1 & 2 
incorporate a strand-determination algorithm to determine the more likely 
strand. The rejected exon's coding probability is, nonetheless, displayed 
(by default). It can, however,  be toggled off, using this option.
									      
DESCRIPTION OF ANNOTATION TOOL:

Select Menu: The annotation of only a single type of item is displayed, at a 
time. The type of item to be displayed can be selected from this menu in the
Annotation window. 

User-Input: Selecting "User-Input" option from this menu displays a window with 
fields which can be filled by the user. Unlike all other items, User-Input is 
always a part of the annotation.

Sequence: Selecting "Sequence" option from this menu displays (the first and 
last 250 bases of) the DNA sequence itself. The entire sequence will be 
included in the annotation report. 

Grail Publications: Selecting "Grail Publ" option from this menu displays the 
list of GRAIL-related publications.

Grab, Ungrab: All features of the currently selected (in annotation menu) 
feature type can be brought into the annotation report by clicking on the "Grab"
button. Similarly, they can all be "deannotated" by clicking on the "Ungrab" 
button, in this window.

Protein Translations: To include the protein translation of Exons, Gene Model 
Exons or Gene Models, double-click on its entry in the appropriate table
(Grail Exons, Model Exons or Gene Models). If that item is not annotated, on
first double-click, it gets marked for annotation  (an 'X' appears to the left
of the entry). On second double-click, a 'T' is displayed instead of 'X',
indicating that the item and its protein translation is marked for annotation.

All: The user can select the item types to be included in the annotation 
report by checking the boxes next to the corresponding Menu options (under the 
Incl column). Selecting "All" overrides the checkmarks, and includes all the
annotated items from all item types in the annotation report.

Print/Save: The annotation report can be printed by clicking on the "Print" 
button or saved to a file by clicking on the "Save" button.

Annotation File: The annotation report is saved in a file, which is stored in 
the same directory as the sequence file. The annotation report file name 
consists of the sequence file name, appended with .subset.anno.1.3c (for "Incl" 
option) or .full.anno.1.3c (for "All" option), followed by the current date
(e.g. humactga.seq.subset.anno.1.3c.07_26_1994).

CHANGES IN XGRAIL 1.3c (from XGRAIL 1.3b):
------------------------------------------

1. The maximum sequence length has been increased from 100000 to 200000 bases.

2. In genquest window, an additional option has been provided to perform a
database search for the DNA sequence corresponding to an exon, a gene model or
a model exon. Also, the query is displayed before launching any database query.
The user has the option to cancel the query, at that point.

3. In the Features menu, a * after the feature name indicates whether that
feature has already been obtained. When the user clicks on the "Repttv DNA"
feature option (and the query has not already been performed for that feature),
a message is displayed to remind the user that the query could take a long time.
The user has the option to launch the query or cancel it, at that point.

4. The annotation report can now be saved and/or printed. User-specified
annotation is saved to the analysis file & reloaded on loading the analysis
file.

5. INCOMPATIBILITY with previous XGRAIL(Versions 1.3b & earlier) analysis files:
Analysis files generated by previous versions of XGRAIL can not be loaded into
xgrail_1.3c.
									      
------------------------------------------
XGENQUEST CLIENT-SERVER SYSTEM USER MANUAL
------------------------------------------

XGENQUEST is a client-server implementation of the integrated sequence
comparison system.

Currently the client software has been tested on Sparc stations running Open
Windows 3.0 and SunOS 4.1.3.

Connection of the user's machine to the Internet is required.


Differences between XGENQUEST & GENQUEST E-mail Access
------------------------------------------------------

GENQUEST server expects the query to be in a specified format, described in the
USER MANUAL FOR GENQUEST E-MAIL SERVER, described above.

XGENQUEST client software formats the query based on the options selected by the
user, thus relieving the user from that responsibility.

XGENQUEST allows only a single database target to be specified in a query,
whereas e-mail query can specify multiple database targets.

XGENQUEST does not support the IBM FLASH method of sequence comparison.


File Management in XGENQUEST
----------------------------

XGENQUEST allows the user to browse the filesystem and displays all filenames
with extensions .seq (for DNA sequences) and .prt (for protein sequences), and
subdirectory names. When the user selects a file for loading (by double-
clicking on the filename), the sequence is displayed in a pop up window.

File Format: XGENQUEST expects the .seq and .prt files to be in FASTA format;
please refer to examples included in the software distribution.

The user can save searches to the disk. The searches are saved in individual
files (the filename for a search file is the name of the sequence file,
appended with .gqr, and search number, e.g. humvpnp.seq.gqr1). The user can
select a search file (using the browser), to be displayed in a popup window.
The user can also delete a search file, using the Delete Search button in this
popup window.
									      
OBTAINING AND INSTALLING XGENQUEST (Version 1.1) CLIENT SOFTWARE
----------------------------------------------------------------

1. Create a subdirectory in which you wish to install XGENQUEST (Version 1.1).

     % mkdir XGENQUEST

   Go to that subdirectory

     % cd XGENQUEST

2. Obtain the XGENQUEST (ver 1.1) distribution by anonymous ftp, as follows:

     % ftp arthur.epm.ornl.gov (or ftp 128.219.9.76)

     Name: anonymous
     Password: [your internet address]

     ftp> cd pub/xgenQuest/sun/ver1.1
     ftp> binary
     ftp> get README
     ftp> get xgenQuest.sun.ver1.1.tar.Z
     ftp> quit

3. Extract the files from xgenquest.sun.ver1.1.tar.Z

     % zcat xgenQuest.sun.ver1.1.tar.Z | tar xvf -


4. At this point, there should be following files in XGENQUEST subdirectory :

     Manual.grail1.3-genquest.May96	(Grail-Genquest User Manual)
     README
     testseqs                       (Subdirectory containing test sequences)
     xgenQuest_1.1
     xgenQuest.sun.ver1.1.tar.Z     (Can be deleted at this point)

5. You can start up the xgenQuest_1.1 program:

   From the command line        % xgenQuest_1.1

   OR

   From the file manager by double-clicking on the xgenQuest_1.1 icon.
									      
--------------------------------------------------------
GRAILCLNT (Command line interface to grail and genquest)
--------------------------------------------------------

GRAILCLNT is a command line interface to all grail and genquest analysis
modules. It allows the user to perform specific analysis of sequences,
independently of XGRAIL or XGENQUEST. GRAILCLNT can take input from stdin and
write output to stdout. It can therefore be conveniently integrated into a
larger system. The following paragraphs explain how to obtain and use GRAILCLNT.


OBTAINING AND INSTALLING GRAILCLNT SOFTWARE
-------------------------------------------

1. Create a directory in which you wish to install grailclnt:

   % mkdir GRAILCLNT

   Go to that directory

   % cd GRAILCLNT

2. Obtain the grailclnt distribution by anonymous ftp:

   % ftp arthur.epm.ornl.gov (or ftp 128.219.9.76)

     Name: anonymous

     Password: 

     ftp> cd pub/grailclnt/	(where  can be: sun, dec, sgi)
	
     ftp> binary

     ftp> get README

     ftp> get grailclnt..tar.Z

3. Extract the files from grailclnt..tar.Z

   % zcat grailclnt..tar.Z | tar xvf -

4. At this point, there should be the following files in GRAILCLNT directory:

   Manual.grail1.3-genquest.May96	(Grail-Genquest User Manual)

   README

   grail_svc_table

   grailclnt

   dir.Humadag

   grailclnt..tar.Z	(which can be deleted at this point)


Using grailclnt:
----------------

grailclnt is a command line interface to all grail services, which are listed
in the accompanying file, grail_svc_table.

The syntax for using grailclnt is as follows:

% grailclnt -host  -port  \
		[ < input_file ] [ > output_file ]

grail server currently runs on the machine, galahad.epm.ornl.gov on port 2050.


A description of input formats for all available grail services can be
found in the accompanying file, grail_svc_table.

Examples of input to and output from several grail services (for the sequence
humadag) can be found in the accompanying directory, dir.Humadag.
									      
------------------------------------------------------
Description of grail modules and input-output formats:
------------------------------------------------------

			Grail Version 1.3 Services


Organism --			Human	Mouse	Arabi-	Droso-	E. coli	Options
Feature						dopsis	phila		-Params
   |
		ORGLABEL --	human	mouse	arabd	droso	ecoli

		FTRLABEL
		   |

grail1		grail1exons 	yes	yes	-	-	yes	3 Opts

grail1a 	grail1aexons	yes	yes	-	-	-	3 Opts

grail2		grail2exons	yes	yes	yes	yes	-	4 Opts

gap2		gap2		yes	yes	yes	yes	-	3 Opts
									3 Params

polya		polya		yes	yes	-	-	-	1 Opt

polIIprom	polIIprom	yes	yes	-	-	-	1 Opt

cpg		cpg		yes	yes	-	-	-	1 Opt
(*)

rpttv		rpttv		yes	yes	-	-	-	1 Opt
(*)									1 Param

smprpt		smprpt		yes	yes	yes	yes	yes	1 Opt
(*)									

fserr		fserr		yes	yes	-	-	-	1 Opt
									1 Param

genquest	genquest	yes	yes	yes	yes	yes
(*)


(*)= Same service for all organisms. Do not specify ORGLABEL)


OPTIONS:

ALL MODULES	header

grail1		strandscores
		translation

grail1a		shadowexons
		translation

grail2		shadowexons
		clusters
		translation

gap2		genemodel_translation
		modelexon_translation

PARAMS:		Param1			Param2		Param3

gap2		AUTO_STRAND/				
		FORWARD_STRAND/
		REVERSE_STRAND

rpttv		

fserr		AUTO_STRAND/
		FORWARD_STRAND/
		REVERSE_STRAND

Grail Service Request:

The current version for all services is 1.3

[Request_[_ORGLABEL]_]

[Option= ]
::::::::::::::::::::
[Option= ]

[Params=  ..... ]

[Start DNASequence]

ACGTACGACTCGACTAGCATC......

:::::::::::::::::::::::::::

.........GCTAGCTACGAT

[End DNASequence]

	exon table goes here (for polIIprom)

	shadowexon table goes here (for polIIprom)

	cluster tables go here (for gap2 & polIIprom)

[End_Request]


NOTE: 

a) For polIIpromoters, the exon, shadow exon and cluster tables
returned by grail2 must be included after the end of the DNA sequence.

b) For gap2 (gene assembly) & polIIpromoters, the cluster tables
returned by grail2 must be included after the end of the DNA sequence.

Examples:

1)

	[Request_grail1exons_ecoli_1.3]

	[Option= strandscores]
	[Option= translation]

	[Start DNASequence]

	ACGTACGACTCGACTAGCATC......
	:::::::::::::::::::::::::::
	......GCTAGCTACGATCAGCTQAGC

	[End DNASequence]

	[End_Request]

2)
	[Request_grail1aexons_human_1.3]

	[Option= shadowexons]
	[Option= translation]

	[Start DNASequence]

	ACGTACGACTCGACTAGCATC......
	:::::::::::::::::::::::::::
	......GCTAGCTACGATCAGCTQAGC

	[End DNASequence]

	[End_Request]

3)
	[Request_grail2exons_droso_1.3]

	[Option= shadowexons]
	[Option= clusters]
	[Option= translation]

	[Start DNASequence]

	ACGTACGACTCGACTAGCATC......
	:::::::::::::::::::::::::::
	......GCTAGCTACGATCAGCTQAGC

	[End DNASequence]

	[End_Request]

4)
	[Request_fserr_human_1.3]

	[Params= AUTO_STRAND]

	[Start DNASequence]

	ACGTACGACTCGACTAGCATC......
	:::::::::::::::::::::::::::
	......GCTAGCTACGATCAGCTQAGC

	[End DNASequence]

	[End_Request]

5)
	[Request_rpttv_1.3]

	[Params= human]

	[Start DNASequence]

	ACGTACGACTCGACTAGCATC......
	:::::::::::::::::::::::::::
	......GCTAGCTACGATCAGCTQAGC

	[End DNASequence]

	[End_Request]

6)
	[Request_gap2_arabd_1.3]

	[Option= header]
	[Option= genemodel_translation]
	[Option= modelexon_translation]

	[Params= AUTO_STRAND 1 3583]

	[Start DNASequence]

	ACGTACGACTCGACTAGCATC......
	:::::::::::::::::::::::::::
	......GCTAGCTACGATCAGCTQAGC

	[End DNASequence]

	[Start clusters forward]
	[Num= 2]

	[Start cluster]
	1474	1526	1	25	0
	[End cluster]

	[Start cluster]
  	2738   2798      1     61      2
 	2738   2804      1     34      2
 	2738   2816      1     30      2
  	2751   2798      1     23      2
	[End cluster]

	[End clusters forward]


	[Start clusters reverse]
	[Num= 3]

	[Start cluster]
	160    317      1     46      0
	160    315      1     46      0
	160    347      1     39      0
	160    278      1     36      0
	160    395      1     34      0
	160    222      1     26      0
	[End cluster]

	[Start cluster]
	441    513      0     59      1
	[End cluster]

	[Start cluster]
	626    746      1     21      2
	[End cluster]

	[End clusters forward]

	[End_Request]

7)
	[Request_polIIprom_human_1.3]

	[Option= header]

	[Start DNASequence]

	ACGTACGACTCGACTAGCATC......
	:::::::::::::::::::::::::::
	......GCTAGCTACGATCAGCTQAGC

	[End DNASequence]

	[Start grail2exons_human_1.3] [Seqlen= 3583]

	[Start exons]
	[Num= 5]

	f 0    916    1044     853    1116   100.000    excellent
	f 2   1134    1416     954    1445   100.000    excellent
	f 0   1651    2130    1594    2148   100.000    excellent
	f 1   2169    2366    2135    2419   100.000    excellent
	f 1   2444    2587    2420    2587    88.000    excellent

	[End exons]

	[Start shadowexons]
	[Num= 2]

	r 0   2443    2542    2419    2637    50.000         good
	r 1   1019    1462    1001    1462    77.000    excellent

	[End shadowexons]

	[Start clusters forward]
	[Num= 2]

	[Start cluster]
	1474	1526	1	25	0
	[End cluster]

	[Start cluster]
  	2738   2798      1     61      2
 	2738   2804      1     34      2
 	2738   2816      1     30      2
  	2751   2798      1     23      2
	[End cluster]

	[End clusters forward]


	[Start clusters reverse]
	[Num= 3]

	[Start cluster]
	160    317      1     46      0
	160    315      1     46      0
	160    347      1     39      0
	160    278      1     36      0
	160    395      1     34      0
	160    222      1     26      0
	[End cluster]

	[Start cluster]
	441    513      0     59      1
	[End cluster]

	[Start cluster]
	626    746      1     21      2
	[End cluster]

	[End grail2exons_human_1.3] 

	[End_Request]
									      
-----------------------------------------------
WORLD WIDE WEB INTERFACE TO GRAIL AND GENQUEST:
-----------------------------------------------

GRAIL and GENQUEST can be accessed through a WWW browser such as Netscape
Navigator. The URL (Uniform Resource Locator) is: 

http://avalon.epm.ornl.gov/Grail-1.3

There is an online help file at: 

http://avalon.epm.ornl.gov/Grail-1.3/help.html

----------------
ACKNOWLEDGEMENTS
----------------

GRAIL Research and Development is supported by the Office of Health and
Environmental Research, United States Department of Energy under contract No.
DE-AC05-960R22464 with Lockheed Martin Energy Research Corporation.

DATABASES
---------
We thank the administrators of the following databases:

SWISS-PROT (Bairoch and Boeckmann, 1992)
PDB (Brookhaven National Laboratory)
PROSITE (Bairoch, 1993)
GSDB (Bilofsky and Burks, 1988)
BLOCKS/BLIMPS (Henikoff and Henikoff, 1991)
DBEST (Boguski et al., 1993)
HUMAN REPETITIVE DNA (Jurka, 1990; Jurka, Walichiewicz and Milosavljevic,1992;
Jurka et al., 1993)

METHODS
-------
We thank the authors of the following methods:

FASTA (Pearson and Lipman, 1988)
BLAST (Altschul, 1990)
Smith-Waterman (Smith and Waterman, 1981)

SERVERS
-------
We thank IBM T. J. Watson Research Center for the use of their dFLASH server.

----------------
SOFTWARE SUPPORT
----------------

A copy of this Manual can be obtained by sending a message to GRAIL@ornl.gov or
Q@ornl.gov with the word HELP on the subject line or on the first text line. 

Questions, Suggestions and Help:

If you have any questions or suggestions, or need further help with any GRAIL
system, please send an e-mail to the GRAILMAIL@ornl.gov address.



-----------------------------------
PUBLIC DOMAIN NOTICE AND DISCLAIMER
-----------------------------------

This software is a "United States Government Work" under the terms of the
United States Copyright Act. It was written as part of the authors' official
duties as United States Government employees and the software and its names
(Grail and GenQuest) are the property of the U.S. Department of Energy.

Although all reasonable efforts have been taken to ensure the accuracy and
reliability of the software and data, Oak Ridge National Laboratory and the
U.S. Government do not and cannot warrant the performance or results that may
be obtained by using this software or data. O.R.N.L. and the U.S. Government
disclaim all warranties, express or implied, including warranties and
performance, merchantability or fitness for any particular purpose.

Please cite the authors in any work or product based on these systems.
									      
------------------
GRAIL PUBLICATIONS
------------------

Uberbacher, E. C., Mann, R. C., Hand, R. C., and Mural R. J. (1991) "A Neural
Network-Multiple Sensor Based Method for Recognition of Gene Coding Segments in
Human DNA Sequence Data," ORNL/TM-11741.

Guan, X., Mann, R. C., Mural, R. J., and Uberbacher, E. C. (1991) "On Parallel
Search of DNA Sequence Databases," Proceedings of the 5th SIAM Conference on
Parallel Processing for Scientific Computing, Chap. 46, pp. 332-337.

Uberbacher, E. C., and Mural, R. J. (1991) "Locating Protein Coding Regions in
Human DNA Sequences Using a Multiple Sensor-Neural Network Approach," Proc.
Natl. Acad. Sci. USA, Vol. 88, pp. 11261-11265.

Einstein, J. R., Uberbacher, E. C., Guan, X., Mural, R. J., and Mann, R. C.
(1991) "GAP - A Computer Program for Gene Assembly," ORNL/TM 11924.

Mural, R. J., Einstein, J. R., Guan, X., Mann, R. C., and Uberbacher, E. C.
(1991) "An Artificial Intelligence Approach to DNA Sequence Feature
Recognition," Trends in Biotechnology, Vol. 10, pp. 66-69.

Guan, X., Mural, R. J., Mann, R. C., Uberbacher, E. C. (1991) "Searching
Consensus Patterns on Hypercube" Sixth Distributed Memory Computing Conference,
pp. 470-472, Portland, OR.

Guan, X., Mural, R. J., Einstein, J. R., Mann, R. C., and Uberbacher, E. C.
(1992) "GRAIL:  An Integrated Artificial Intelligence System for Gene
Recognition and Interpretation," Eighth IEEE Conference on AI Applications,
IEEE Computer Society Press, Monterey, CA, March 2-6, pp. 9-13.

Uberbacher, E. C., Einstein, J. R., Guan, X., Mural, R. J. (1992) "Gene
Recognition and Assembly in the GRAIL System:  Progress and Challenges," The
Second International Conference on Bioinformatics, Supercomputing, and Complex
Genome Analysis, pp. 465-476.

Einstein, J. R., Mural, R. J., Guan, X., and Uberbacher, E. C. (1992)
"Computer-based Construction of Gene Models Using the GRAIL Gene Assembly
Program," ORNL/TM-12174.

Xu, Y., Mural, R. J., Shah, M., Uberbacher, E.C. (1994) "Recognizing Exons in
Genomic Sequence Using GRAIL II,"  Genetic Engineering, Principles and Methods,
Plenum Press, Vol. 15.

Xu, Y., Einstein, J. R., Mural, R.J., Shah, M. and Uberbacher, E.C., (1994) "An
Improved System for Exon Recognition and Gene Modeling in Human DNA Sequences,"
Published Presentation: The Second International Conference on Intelligent
Systems for Molecular Biology, Stanford University, San Francisco, CA, August
14-17.

Mural, R.J., Guan, X., and Uberbacher, E.C., (1993) "Computational Methods for
Locating Biological Features in DNA Sequences," Current Protocols in Human
Genetics, Unit 6.5, Supplement 6.

Xu, Y., Mural, R.J., Uberbacher, E.C., (1994) "Constructing Gene Models From
Accurately-Pedicted Exons: An Application of Dynamic Programming," Computer
Application in the Biosciences, Vol. 10, No. 6, 1994, pp. 613-623.

Uberbacher, E.C., "ORNL Announces genQuest and X-GRAIL," (1994) Human Genome
News, Vol. 5, No. 5, pp. 8-9.

M.B. Shah, Ying Xu, J.R. Einstein, Xiaojun Guan, Loren J. Hauser, Sherri A.
Matis, Ronald W. Lee, Richard J. Mural, Edward C. Uberbacher, "Gene Discovery
and Sequence Annotation in GRAIL 1.3," abstract to be presented and published in
The Hilton Head DNA Sequence Conference, Hilton Head, S.C., September 16-20,
1995.

Sherri Matis, Ying Xu, Manesh Shah, Xiaojun Guan, J. R. Einstein, Richard J.
Mural, Edward Uberbacher, "Detection of RNA Polymerase II Promoters and
Polyadenylation Sites in Human DNA Sequence," Computers & Chemistry, Vol. 20,
No. 1, pp. 135-140, 1996.

Edward C. Uberbacher, Ying Xu, and Richard J. Mural, "Discovering and
Understanding Genes in Human DNA Sequence Using GRAIL," Computer
Methods for Macromolecular Sequence Analysis, September, 1995.

X. Guan and E.C. Uberbacher, "A Fast Look-up Algorithm for Detecting Repetitive
DNA Sequences," Abstract in Proceedings of The First Pacific Symposium on
Biocomputing, January 3-6, 1996, pp. 718-719.

Ying Xu, Richard J. Mural, and Edward C. Uberbacher, "An Iterative Algorithm for
Correcting Sequencing Errors in DNA Coding Regions," Presentation to be
published as full article in DIMACS Workshop on Gene-Fidng and Gene Structure
Prediction, Philadelphia, PA, October 13-14, 1995.

Edward C. Uberbacher, Ying Xu, Manesh Shah, Sherri Matis, Xiaojun Guan, and
Richard J. Mural, "DNA Sequence Pattern Recognition Methods in GRAIL,"
Presentation to be published as full article in DIMACS Workshop on Gene-Finding
and Gene Structure Pediction, Philadelphia, PA, October 13-14, 1995.

Ying Xu, Richard J. Mural and Edward C. Uberbacher, "Correcting sequencing
errors in DNA coding regions using a dynamic programming approach," Computer
Applications in the Biosciences, Vol. 11, NO. 2, 1995, PP. 117-124.

Ying Xu, Richard J. Mural, and Edward C. Uberbacher, (1995) "An Iterative
Algorithm for Correcting Sequencing Errors in DNA Coding Regions," Journal of
Computational Biology (in press).

Mark W. Craven, Richard J. Mural, Loren J. Hauser, and Edward C. Uberbacher,
"Predicting Protein folding Classes without Overly Relying on Homology,"
Proceedings of the 3rd International Conference on Intelligent Systems for
Molecular Biology, C. Rawlings and D. Clark (eds), AAAI Press, Menlo Park, CA,
1995.

Ed Uberbacher, "Discovering the Intelligence in Molecular Biology," Tibtech,
Vol. 13, pp. 497-500, December 1995.

Ying Xu and Edward C. Uberbacher, "Gene Prediction by Pattern Recognition and
Homology Search," The Fourth International Conference on Intelligent Systems for
Molecular Biology, St. Louis, MO, June 13-15, 1996.

Ying Xu and Edward C. Uberbacher, "A Polynomial-Time Algorithm for a Class of
Protein Threading Problems," Journal of Computer Applications in Biosciences
(in press).

M. B. Shah, X. Guan, J. R. Einstein, S. Matis, Y. Xu, R. J. Mural and 
E. C. Uberbacher, "User's Guide to GRAIL and GENQUEST (Sequence Analysis,
Gene Assembly And Sequence Comparison Systems) E-mail Servers and XGRAIL
(Version 1.3c), GRAILCLNT (Version 1.3) command line interface and XGENQUEST
(Version 1.1) Client-Server Systems," Available by anonymous ftp to 
arthur.epm.ornl.gov (128.219.9.76) from directory pub/xgrail or pub/xgenQuest
or pub/grailclnt as file Manual.grail1.3-genquest.May96 (May 1996).
									     
----------
REFERENCES
----------

[1]     Bairoch, A. and B. Boeckmann. 1992. Nucl. Acids Res., 20: 2019-2022.

[2]     Bairoch, A. 1993. Nucl. Acids Res., 21: 3097-3103.

[3]     Jurka, J., Walichiewicz, J. and A. Milosavljevic. 1992. J. Mol. Evol.
	35: 286-291.

[4]     Abola, E.E.,Bernstein, F.C., Bryant, S.H., Koetzle, T.F. and J. Weng.
	1987. Protein data bank.  pp. 107-132 in "Crystallographic Databases-
	Information Content, Software Systems, Scientific Applications," F. H.
 	Allen, G. Begerhoff and R. Sievers, eds. Data Commission of the
        International Union of Crystallography, Cambridge.

[5]     Smith, T. F., and M. Waterman. 1981. Advan. Appl. Math. 2: 482-489.

[6]     Pearson, W. R. and D. J. Lipman. 1988. Proc. Natl. Acad. Sci. USA, 85:
 	2444-2448.

[7]     Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and D. J. Lipman.
	1990. J. Mol. Biol, 215: 403-410.

[8]     Dayhoff, M. O., Schwartz, R. M. and B. C. Orcutt. 1978. In "Atlas of
	Protein Sequences and Structure," (Dayhoff, M. O. ed) Vol. 5, Suppl. 3,
 	pp. 345-352.  Nat. Biomed. Res. Found., Washington, D. C.

[9]     Henikoff, S. and Henikoff, J.G.  1992.  Proc. Natl. Acad. Sci. USA
	89:10915-10109.

[10]    Claverie, J-M. and States, D. J.  1993.  Computers Chem. 17:1919-201.

[11]    Califano, A. and Rigoutsos, I.  1993.  In: "Proceeding of the First
	International Conference on Intelligent Systems for Molecular Biology"
 	July, 1993, Bethesda, MD.