Net::Z3950::DBIServer::Intro - An Introduction to zSQLgate
zSQLgate
provides an Z39.50 interface to your relational databases.
That is, it provides a generic gateway between the Z39.50 Information
Retrieval protocol and pretty much any relational database you
care to mention.
The Perl module the Net::Z3950::DBIServer manpage provides the guts of the
gateway. Both the zSQLgate
program and the
Net::Z3950::DBIServer
module are included in this distribution.
The ANSI/NISO Z39.50 information retrieval protocol (now ratified as international standard ISO 23950) is a mature and powerful protocol used in application domains as diverse as bibliographic information, geo-spatial mapping, cultural heritage, and structured vocabulary navigation. It's particularly useful in distributed systems that need to provide uniform access to a variety of different information resources, and has therefore provided the technical backbone of many European and international collaborative projects. You can read more at the Z39.50 Maintenance Agency, at http://lcweb.loc.gov/z3950/agency/
Relational database managegement systems (or RDBMSs) have been with us for decades, but don't seem to be showing any signs of going away yet :-) The ubiquitous SQL language, which is used to search them, has its roots three decades ago in 1973, when an early version (then called ``Sequel'') was described in R. F. Boyce and D. D. Chamberlin's paper Using a Structured English Query Language as a Data Definition Facility (IBM RJ 1381, December).
Z39.50 and relational databases may seem to belong to separate universes, but in practice they often need to play nicely together. Many, perhaps most, Z39.50 servers are built on top of relational databases; and many IR projects need to add Z39.50 interfaces to existing relational databases.
This has traditionally been an awkward and error-prone process, as
most of the available tools are rather low-level, and require
extensive programming. zSQLgate
changes that by providing a
generic gateway - a Z39.50 server which
serves the data from a relational database. Instead of requiring
programming, setting it up is a matter of writing a relatively
straightforward configuration file.
You may need zSQLgate
if:
You already have an established project built on Oracle, Sybase, PostgreSQL, MySQL or any of a seemingly infinite number of alternatives, and you need to build an Z39.50 interface to it.
You have a wide variety of relational databases, similar in concept but different in the details, possibly running on different machines and using different RDBMSs; and you want to build a single interface that treats them all uniformly.
You want to build a Z39.50 server from scratch, and for some reason dedicated tools such as Index Data's Zebra - http://www.indexdata.com/zebra/ - are not suitable. You may want to use a relational database because you're used to it, or because you need industrial-strength data integrity, or commit/rollback, or some other feature of a particular RDBMS.
Part of the strength, and also part of the weakness, of Z39.50 is the fact that it is not a monolithic standard: implementations are not required to implement all of it - indeed, there is probably no Z39.50 implementation that supports the whole standard - but only those parts which are useful to it.
In this section, we briefly discuss which parts of the Z39.50 standard
zSQLgate
supports.
zSQLgate
supports the Z39.50 Type-1 query (``RPN''), and the
identical Type-101 query. Queries may include arbitrary combinations
of the AND, OR and ANDNOT boolean operators, nested to any depth.
Proximity operators are not supported.
Within individual terms, the following attributes are supported:
Any access points from any attribute sets may be supported, depending
on the attrset
clauses in the configuration file.
The configuration file may specify what access-point is used if none is explicitly indicated by a query.
The ordering relations 1 (less than), 2 (less then or equal), 3 (equal), 4 (greater than or equal), 5 (greater than) and 6 (not equal) are all supported; but not the more esoteric relations 100 (phonetic), 101 (stem), 102 (relevance) and 103 (AlwaysMatches).
If no explicit relation attribute is specified, equality (value = 3) is assumed unless overridden by the configuration file.
Truncation attributes 1 (right-truncation), 2 (left-truncation), 3 (left- and right-truncation), 100 (do not truncate) and 101 (process # in search term) are supported; but not 102 (RegExpr-1) or 103 (RegExpr-2).
If no explicit relation attribute is specified, no truncation (value = 100) is assumed unless overridden by the configuration file.
Attributes of type 3 (position), 4 (structure) and 6 (completeness) are ignored. All search terms are treated as being of type string - that is, as though an attribute of type 4 (structure) and value 108 (string) had been specified.
zSQLgate
currently supports retrieval using the following record
syntaxes:
The SUTRS record-syntax is supported natively, with no need for any
configuration. The returned record is extremely raw, consisting only
of a list of elements, one per line, in the format ``field: value
'',
sorted in alphabetical order by fieldname. This can be useful for
debugging a configuaration because of the minimal munging involved.
This default formatting of SUTRS records may optionally be overridden by the configuaration file, for servers that need to support SUTRS-based clients.
zSQLgate
supports the generation of XML in any format, using a
two-stage process. An initial, simple, record is formed using a set
of elements whose names are mapped to expressions from the database;
then that record may be passed through an arbitrary XSLT stylesheet to
transform it into the desired format. In this way, for example,
MarcXML or RDF records can be generated.
The GRS-1 record-syntax is supported, but at present the generated records can include only top-level fields (i.e. tag-paths of a single element). Support for sub-records will be added in a subsequent release if there is demand for it.
MARC records can be generated by a field-mapping specification. The
precise dialect of MARC (MARC21, UKMARC, etc.) supported by a
particular deployment of zSQLgate
is a function of the mapping
specified in the configuration file. MARC records may contain
repeated fields.
zSQLgate
does not currently include any support for Z39.50's
sort
, scan
or extended services
. This functionality can be
added if required: contact the author if you wish to build a server
that provides these services.
zSQLgate
is focused on doing one thing well: that is, providing the
means for Z39.50 clients to search in, and fetch data from, relational
databases. So it does not currently address any of the following:
Database update. zSQLgate
provides a read-only interface.
Z39.50 ``extras'' such as inter-library loan (ILL).
``Fan out'', or the provision of ``union catalogues'' (though this could be added in subsequent versions if there's demand for it.)
Intended enhancements for zSQLgate
include the following:
Allow specification of semantics for attributes other than access points. (Relation and truncation are handled by hardwired code that knows about the relevant attributes in the BIB-1 and Utility sets.)
Allow elements to be marked for inclusion in the brief record
(element-set name b
). More generally, provide the wherewithal for
the configuration file to specify the contents of arbitrary element
sets.
Allow the configuration file to override some global parameters on a per-logical-database basis. For example, the back-end data-source (DBI database, or DSN) and authentication parameters.
Allow the configuration file to set other DBI parameters such as
auto_commit
.
Allow the configuration file to specify which port or ports the server should listen on. Maybe also other Z39.50 server options.
Allow the inclusion of sub-files. This will be useful to allow, for example, element sets to be specified in their own files.
Provide the ability to make multiple back-end databases look like a single large data repository (``union catalogue''). This is a much bigger deal the most of the other enhancements mentioned here, and will probably only happen if there's a real need for it in a specific project.
I plan to implement these enhancements more or less in the order that customers need them, so do give me a shout if anything listed here (or indeed anything not listed) is high on your ``must have'' list, and I'll see what I can do.
This module is released under non-free terms - see the Net::Z3950::DBIServer::LICENCE manpage. In a nutshell, you can download, unpack, build and evaluate it for free, for you may not deploy it without first purchasing a deployment licence.
The price of a deployment licence is either £2500 as a one-off fee for perpetual deployment, or £1000 for a one-year deployment licence and an additional £750 per year thereafter. That fee includes limited support and a small amount of development work where appropriate.
If you've not already read the licence, you should. the Net::Z3950::DBIServer::LICENCE manpage. If you need help installing, you might find the Net::Z3950::DBIServer::Install manpage helpful. Then it's on to the the tutorial, the Net::Z3950::DBIServer::Tutorial manpage, after which you may wish to go on to some of the more brutally technical documents, including the configuration file specification, the Net::Z3950::DBIServer::Spec manpage.
Mike Taylor <mike@miketaylor.org.uk>
First version Sunday 24th February 2002.