Zavershennost' zhizni, i kratkoj i dolgoj, opredelyaetsya tol'ko tsel'yu, radi kotoroj ona prozhita. - D. Dzhordan

To, chto stoit sdelat', stoit sdelat' horosho. - N. Pussen

Large, Multilingual Web-site Maintenance: The US-Russian Friends and Partners Database Approach.

Natasha Bulashova ( natasha@www.friends-partners.ru)
Friends and Partners Foundation,
Moscow, Russia

Greg Cole (gcole@friends-partners.org)
Center for International Networking Initiatives,
The University of Tennessee
USA

Table of Contents

Introduction

The US Russian Friends and Partners project, based at the University of Tennessee, Knoxville Center for International Networking Initiatives and at the Friends and Partners Foundation in Moscow, has been active for nearly five years, promoting a grassroots effort to further US-Russian collaboration and exchange using communications and information technologies. Beyond the many information and communications services spawned and supported by "Friends and Partners" (F&P), the project has created a number of successful spin-off activities such as the "Russian Civic Networking Program" (developing community networks in several Russian cities with the support of the Ford and Eurasia Foundations) and the more recent NaukaNet initiative -- a $6.5 million, five year program funded by the US National Science Foundation and the Russian Ministry of Science and Technology, to develop a high-performance, next generation Internet infrastructure between US and Russian scientific communities.

A major responsibility involves maintaining a large information site with several thousand pages and Internet "objects" on which it maintains information -- introducing the well-documented problems associated with large web-site maintenance. During the past two years the US and Russian F&P team members have developed a rather unique approach to web-site maintenance and presentation, driven entirely by an underlying relational database and involving dynamic construction and presentation of all web pages. While such use of databases and dynamic resource construction is not unique among corporate enterprise, it is rather rare in the nonprofit world.

The system described in this paper features extensions to support special requirements of large, multicultural projects. These include support for multilingual material, creation of pages tailored for specific classes of browsers (with special attention to simultaneous support of high-end and low-end users), automated support of multiple character set encodings (important for Cyrillic text) and accommodating some user control of overall graphic design and appearance of entire web-sites through the use a facility we term "Personalities". The integration of overall page layout and design through use of a special "Styles" database enables site administrators to easily modify overall site. The object-oriented method of assembling web pages from discreet page elements also permits the introduction of random features such as random graphics, quotations, news updates, etc., into page construction.

Further, this system integrates the "Dublin Core" metadata model, enabling the system to maintain uniform information about local resources and the several thousand external resources we find relevant to project activities.

Use of the system has greatly simplified our work in maintaining a quality information resource. We feel that knowledge of this approach may be useful to other nonprofit organizations with similar challenges in maintaining information services. We are particularly interested in increasing awareness of these types of capabilities to nonprofit organizations who will find it increasingly difficult to compete with commercial organizations for providing quality information services to their constituencies.

The system is quite inexpensive to maintain and is built exclusively from noncommercial software components that are readily available to anyone running a UNIX server.

The remainder of this paper will discuss the capabilities of this system, the structure of the underlying database and a technical description of the developed software base.

System Capabilities

The new database allows for much more dynamic information content (important for civic networking projects, etc.). The following describes the key capabilities of the database system.

  1. We now have an excellent structure for maintaining a very large information base. The core Friends & Partners site (this is not counting the over 100 "family member" sites) now consists of over 6500 resource objects. Trying to maintain all of this information in a text-based system had become simply impossible and unmanageable. We now use the database for doing a much better job of maintaining the quality of our resources and, just as importantly, growing the site to a much higher level of information content and quality organization.
  2. We can set up an entirely new site or completely change the appearance and structure of an existing site in a matter of minutes now (many changes within a matter of seconds). Prior to the use of this database approach, it was inconceivable to think of changing the appearance of our site because of the work involved in changing so many text files. We can now accommodate rather major changes in overall appearance in less than a minute. More significant structural changes can be made in just a little more time.
  3. Through the "personalities" feature, users can themselves select entirely different appearances for the site, utilizing their own preferences for how the pages are structured, how navigation lists and other page components are presented, etc.
  4. The entire database is maintained entirely from simple (and free) Internet browsing software (such as Netscape and MIE). All of the databases are maintained via applications which are represented on the web &endash; controlled by a fairly extensive system for authenticating users to applications and giving different users different levels of access to the databases.
  5. One of the most fundamentally important features of the new system is that we can now share responsibility with our volunteers from across the globe. We receive quite a few offers each month from people who would like to volunteer and help Friends & Partners. The system permits us to authorize individuals to certain portions of the database so that they can contribute. This is an important feature for any community-based project which relies on volunteers.
  6. The structure of the database and accompanying software allows us to automate many functions such as checking the integrity of resources to which we link (making sure those resources have not moved) that formerly required an human time and talent. We are almost finished with a new facility that will even monitor "trusted" Internet resources for new information that we should be adding to our own database (sort of a "robot" for monitoring and gathering information).
  7. The entire software system is based on tools that run on any Unix platform and upon a database manager which is itself an Internet-developed project (with contributions from people all over the world and every component in the public domain). This allows us to share or to transfer this new system with all of our civic networking sites (http://www.friends-partners.org/civnet/), and with other non-profit community networking projects.

Technical Overview

The "Friends and Partners" system utilizes a surprisingly small number of software components. All of our project activities run on Sun SPARCstations running SOLARIS 2.5.1, 2.6 and 2.7. The entire system has also been built and operated on an Intel/LINUX platform as well.

The four software packages required include.

  1. MySQL. Described at www.mysql.net. This is a free, capable and very fast SQL database package which compiles and runs on almost all UNIX (and NT) platforms. While lacking some features of other SQL packages, we have found the reliability and speed of MySQL to be quite suitable for our projects.
  2. Perl 5. Besides being our tool of choice for most CGI scripting and system administration chores, we wrote the central page generation server in PERL because of its excellent regular expression parsing and text handling capabilities. We utilize the DBI module to transparently handle all SQL calls. The use of DBI means that the central server software would be very simple to rewrite to utilize a different SQL package.
  3. PHP3. While we do not use PHP3 parsing for user-generated pages, we do utilize this quite capable package (described at www.php.net) for handling all of our database maintenance applications. We use the PHPLIB library (available at www.shonline.de) to handle application level user authentication and session management. The imbedded Oohforms component is very useful in handling data entry form creation and user input validation.
  4. Apache 1.3.3. We are running this with PHP compiled as a module.

With these basic software components, we have developed the following software to support our applications.

First, the entire system is based on the MySQL database software (although the system could be easily modified to use almost any other relational/SQL database package). The current system is comprised of 24 individual database tables; all maintained in a single database named "Friends". This database maintains information on all Internet resources and page "objects" utilized by our site.

The maintenance of all databases is handled by surprisingly simple code, written in PHP, which includes good support for MySQL (and other relational packages). Access to these applications is restricted by facilities of the PHPLIB library; access to the different database tables is controlled using standard MySQL authorization tables. Thus, there are two levels of authorization and authentication -- one at the database level; the other at the database maintenance application level. While complex, this yields a great degree of control over user access and user operations. The motivation is to permit as large a number of contributors to our activities as possible -- with careful restrictions on what individual users are allowed to do.

Given all the information residing in these various database tables, the challenge is to provide for the dynamic and fast assembling of information from the database file and appropriate presentation to the user's browser.

A former variant of this system required a Perl-based CGI script to be invoked every time a "Friends and Partners" page was requested from the web server. As the system evolved and became more complex, the cost of invoking Perl, making the necessary database connection, and then assembling the page from what often requires more than twenty database queries was much too high.

The current approach is to maintain a special "server" program (which is written in Perl) and which listens for connections on a specific TCP/IP port, to which it responds with the assembled page. To connect to the server, we use a very small, simple and fast cgi client (a compiled object written in C). When invoked by the calling of certain URLs (using the ScriptAlias configuration directive within Apache), it connects to the server, passes its environment variables (which includes the full URL being requested by the user as well as the other environmental factors such as the user’s location, browser, etc.). After sending its environment variables, it waits on the server to return the HTML, which it, in turn, returns to the user.

This "friends.server" program is roughly 2,600 lines of Perl code. As mentioned previously, Perl is used because of its excellent regular expression parser and its speed and flexibility with basic text manipulation. The use of the DBI module within Perl makes very simple the SQL queries and also enables us to maintain independence from a single database manager.

Upon accepting a connection from the CGI client, the server first determines the record within the central "Friends" database based upon the requested URL. If it is a valid URL for which a record exists, it then begins the assembly process. This process is described in a more detailed section below.

Obviously, the system is built using simple and widely used software components. The two critical elements of the system are the friends.server program and the structure of the underlying databases. The following section will describe the databases, and a later section describes the operation of the central page generation server.

Database Structure

The "Friends and Partners" database system is comprised currently of 24 individual database tables. This number grows as new capabilities are added to the system.

As a preface to describing the individual tables, the following section describes the components of a page generated from the databases which may help the reader better understand the underlying structure.

All of the following comments refer to Figures 1a and 1b. The reader will note, the labeling of specific areas of a "web page" generated from the "Friends and Partners" database; each labeled element is a discreet element drawn from a specific database table. The friends.server program assembles the page based upon the user's request with information gathered from the individual databases. The logos at the top are drawn from definitions using the "Friends" database (label A). The pictures (label B) are drawn randomly from sets of pictures defined in the Friends database. The news items are drawn from another database (label C). The primary content for the web page (label D) is drawn from the central friends database. The Navigation list (label E) is generated from the "Navigation" table. Options for changing to a different display (reflecting different browser classes) (label F) are generated from the Friends server. The options for switching language and character set encoding (label G) are also set by the Friends server. The buttons (label H) are another form of navigation but appear at the bottom of most web pages and are drawn from the "Buttons" database. The trailer (label I) appearing at the bottom of every page is drawn from "Trailers" database. Random quotations and proverbs are not displayed in Figure 1a/b but are generated from corresponding databases for most F&P pages.

Figure 1a. Friends and Partners Page

Figure 1b. Friends and Partners Page

With this as a general preface, a description is offered about the structure of the overall database system.

While comprised of 24 individual tables, the system is driven by two primary tables called "Friends" and "Partners" (obviously named after the parent initiative). The "Partners" database is a relatively simple one and is used to maintain information on the "owners" of all material cited on any of our projects -- i.e., all resources to which we link, as well as our own local material. This database is primarily a bio/demographic profile of all authors and publishers of web content relevant to our activities. The "Partners" database is currently comprised of approximately 3,500 individuals and organizations. While used directly in the assembly of some web material, this database has more value as a means of tracking the individuals and organizations with whom we work. Indeed, the use of this database has become a critically important resource for all of our initiatives. A screen shot of the data maintenance application for "Partners" is included in Figures 2a and 2b to give the reader a better idea of the information maintained in this table.

The "Friends" table is the central database of the entire F&P system. It is used to maintain information on three general types of resources.

  1. "PAGE" records are "HTML pages" maintained on the F&P server itself (thus, these are local resources). A "PAGE" record includes the "URL" of the local page as an identifier. While the URLs assigned appear to reflect a traditional directory structure on a web server, it is not the case with our server since it "points" to a database record from which the page is generated. The assignment of "URLs" is not file structure dependent. Included in page records are attributes pointing to the following page elements.
    1. Navigation List
    2. Button List
    3. Trailer
    4. Categories of quotations to be used on the page (if applicable)
    5. Categories of news items (small text items) to be randomly displayed for the page (if applicable)
    6. The general style of the page. This includes the orientation of the page and the location of placement of each of the elements described above.
    7. Finally, a "PAGE" record includes the HTML itself that is presented as content for that page. Because of our requirement that a record must allow representation in any number of languages, the actual text is maintained in a separate table (called FPtext) which is keyed by the same key for the "Friends" database, plus a language identifier -- for example, "EN" for an English language variant of a page; "RU" for the Russian variant language. As a separate relation, we can maintain any number of languages for a given page.

      Figures, 3, 4 and 5 illustrate the following from the "Friends" database.

    8. Figures 3a, 3b. The basic attributes defining a "PAGE" record.
    9. Figure 4. The attributes defining the elements to be included on that page when the page is generated for the end user.
    10. Figure 5a, 5b. The various text elements stored in the Fptext table including the HTML used for the central content of the page. The HTML contains "placeholders" for items such as random quotations and list structures (described later) in addition to standard HTML elements.

     

  2. "LINK" records are used to maintain external resources (non-local pages) which we find relevant for our activities. Figures 6a and 6b illustrate the elements of a "LINK" record. The most common use of link records is to group them into "LIST" constructs and then present them on local "PAGE" records. This "LIST" construct is described next.
  3. "LIST" records are quite different from "PAGE" and "LINK" records. They are an artificial construct used to group "PAGE" and "LINK" into useful lists for presentation on local pages. A "LIST" record is defined and then "PAGE" and "LINK" records are "attached" to the list via the "Elements" database. In addition to attaching "PAGE" and "LINK" records to a list, the degree of relevance to the list is also included as a label from 01-10. A record that is strongly attached to a "LIST" is given a lower number (with "01" indicating the strongest and "10" indicating the weakest attachment). This label is often used to indicate where to place the item in the list (stronger attachments result in records placed higher in a list).
  4. A list record also maintains elements about how that list is to be displayed when it is included in a "PAGE" record. Figure 7 illustrates these attributes for "LIST" records. Through the use of these attributes, the site administrators can determine whether a list is simply a brief menu of items or whether it provides a detailed description of its constituent "LINKS" and "PAGES" (including, for example, the title, URL and a brief text description). The "sort" attributes also allow the elements of the "LIST" to be sorted in different ways. Some are sorted alphabetically (as an example, lists of Russian cities) and others are sorted by the relevance factor (thus allowing very strongly related items to appear at the top of lists and less strongly items to appear nearer the end of the list).

    The facility is a core feature of the system, enabling the easy maintenance of lists, which we provide on our pages.

    The relationship between "PAGE" records and "LIST" records is a bit complicated. A "PAGE" record can be attached to a list -- causing it to appear wherever that list is included as an element but "PAGE" records also include Lists as an important page of the page construction. At the time a "PAGE" is created, any imbedded lists are compiled and presented to the end user. Use of this facility makes it very simple for us to add new Internet "LINK" records to our site. A newly added "LINK" can be assigned to any number of "Friends and Partners" lists (thus, appearing on any number of our pages).

The following describe the additional tables that are part of the Friends system.

Closely related to the Central Friends database are the Friends Comments table and the FP text table.

  1. The Friends Comments database is a very simple structure allowing us to maintain a history of modifications to the "Friends" database as well as enabling the attachment of brief text comments to each record. It contains only three data elements, the Friends Identifier ("key"), an 80-character text field, and a time stamp field (containing date and time this record was added to the database). While the user can add text comments to the database in this way, this structure is used more commonly by the data maintenance system itself which adds records to this database any time a record in the Friends table is added or modified.
  2. FP Text: This database was briefly mentioned above. It is the table with which textural objects such as object title, object description, object "message" and HTML (for page records) is maintained. The key for this table is the Friends Identifier established in the corresponding "Friends" record, along with language and character set encoding. Structuring the system in this way allows us to maintain any number of languages for a given "friends" object such as a F&P page or an external link record. An English language record is provided for every object in the database and is used if a requested language is not available for that object.
  3. Styles: The styles database is a central database determining how all local pages on the Friends & Partners server appear. As shown in figures 8a and 8b, a "style" is simply an HTML description of a page with "place holders" for various page elements such as navigation lists, headers, trailers, quotations, news items, and primary content - which are substituted at the time a page is generated. The styles database is made more complicated by the fact that we maintain different variants on a style for different browsers. Currently we support four classes of browsers - including "Netscape enhanced", text with graphics, text only, and WebTV. We are currently adding a fifth class for browsers capable of handling cascading style sheets. Thus, for any given style there are generally at least four records for that style - one for each of the browsers. For example, the Netscape enhanced style is generally used to display a more complex table based layout whereas the "text only" style is designed to display text for text only browsers such as Lynx. For more capable browsers, the style is used to specify the basic lay-out of the page -- indicating, for example, which side of the browser the navigation list appears on.
  1. Every "style" is modified by information from the "Personalities" database which is used to provide more detailed design specifications such as type face, size of type, color of design elements, etc. This database is described below.
  1. The style database is comprised currently of about 25 records. Every "Friends" page record "adopts" a style. For the most part, every page record on a given site utilizes the same style, ensuring design consistency across the site and making it trivial to make global changes in site design by the simple modification of a single style record.
  2. Personalities: Mentioned above, the "Personalities" database is used in conjunction with the styles database to handle very detailed elements of page design, including background colors, text size, text color, whether graphic or text headers are to be used, etc. As shown in Figures 9a, 9b, 9c and 9d, a personality record is a set of descriptors for each of these detailed page design elements.
  1. As an example of use of the personalities mechanism, the primary F&P site offers the user eight different personalities allowing the user to select a color and page lay-out scheme of their choice. "Personalities" features are integrated into the style used by a page with substitutions made at the time the page is created. A recent change to this mechanism enables the Personalities database to override the style selection for a page. This enables, for example, a personality to determine whether the page orientation presents the navigation list on the right-hand or left-hand sides of the page (by selecting a style which provides the appropriate layout).
  1. The combination of the "Personalities" and "Styles" databases is a quite powerful and flexible mechanism for site administrators, enabling global changes in site appearance with very little effort. For the end user, this facility provides large control over their browsing experience.
  1. While this provides an interesting feature for our users, we are currently integrating this into our civic networking project to allow different styles for users who may have special needs - such as larger text for sight impaired individuals -- as well as simpler page layout for individuals using equipment to "read" pages to them, etc. This facility integrates quite a bit of flexibility for taking care of such special needs.
  2. Navigation and NavText: A very important component of the web browsing experience is the ability to navigate to other locations on site. The navigation database enables site designers to attach different navigation lists to every page on the various F&P sites. Shown in Figure 10a and 10b, a navigation list is a simple text listing comprised of single line records which are themselves comprised of a label (used on the navigation list) and an accompanying URL (used if the user clicks on the label). If the accompanying URL is missing (i.e., is blank), the label is treated as a header and formatted differently in the navigation list.
  1. For any given navigation list, the system permits the maintenance of any number of different languages for that list. The actual text of a navigation list is maintained in the NavText database. When a page is requested for a specified language, and if that page includes a reference to a navigation list, then the appropriate language record is selected if available (otherwise, English is always defined as back-up).
  2. Buttons and ButtonsText: The "Buttons" database, illustrated in Figures 11a and 11b, is an almost identical database to navigation but determines a shorter navigation list generally included at the bottom of any F&P page which specifies its inclusion. Typical use of the buttons is to point back to the home page of the site, a guest book for the site, and a page for providing comments. The list of buttons elements can include any number of items. As with the navigation database, this database allows for any number of different languages for a "buttons" item, enabling the correct language test for a button element to be provided when that page is generated.
  3. Trailers and TrailersText: The trailers element appears at the very bottom of a site page and normally includes a copyright statement and information about who to contact for more information. A trailer can have any number of language text records associated with it as with the navigation and buttons databases. The text associated with a trailer is normally very simple (and usually very short) HTML text. Trailer data and text elements are illustrated in Figures 12a and 12b.
  4. News and NewsText: The "News" facility is a rather recent addition to the F&P capabilities. It enables us to provide short news items on any page record which specifies its use. As with all other textual elements of F&P, we can maintain any number of different languages for a given news item so that appropriate language content is generated for this element on every page. Every news item can be described with various classifiers, enabling the assignment of a group of "news" records to any page and then the random selection of one of those items at the time a page is requested. Figure 13a illustrates the basic elements and Figures 13b and 13c the English and Russian text text for the record. Figure 4 shows how a page record specifies the random section of a news item.
  5. Elements: This database was mentioned briefly above in the description of the "list" record type of the Friends database. To review, a "list" is established to enable a grouping of page and link records for use within an F&P page record. For example, if we wish to maintain information for a given Russian city, all Internet accessible resources that we are aware of about that city would be established as "link records" for that city and we then attach those records to the list through the use of the Elements database. This is a very simple table containing the Friends identifier (key) and the related Friends identifier (which is the identifier of a list record). As mentioned previously, the database also maintains a "relevance" factor, illustrated in Figure 14, enabling the assignment of the strength of the relationship of the Friends record to that list. When lists are generated for a Friends record, they are generally sorted by the relevance factor thus placing the most strongly related items near the top of the list.
  6. Types, Forms & Key Words: These three databases are all used to maintain metadata on a given Friends record. For example, through the "keywords" database it is possible to assign temporal, spatial, and personage keyword descriptors. The "types" database allows the assignment of different classifiers indicating, for example, whether the object is a web site, a book, an image, etc. The forms database is used to maintain format information about an object (i.e. whether the object is a text based resource, a HTML resource, a Microsoft Word document, a GIF image, etc.) The use of this metadata information will allow for more powerful browsing features enabling, for example, the inclusion of such browser assisting features as "see related records" which when selected will generated listings of related records based on the metadata descriptors. The development of this feature of the system is motivated by the "Dublin Core" metadata project.
  7. Quotations and Proverbs: We have found it useful and interesting to maintain categories of famous quotations and proverbs which we include at random on various F&P pages. Whether or not quotations are included on a page is determined by the quotations text field in the Friends database. Where they are placed on a given page is determined by the associated style record. The quotations text field of the Friends database enables the user to place a series of "category descriptors." When that page is generated, a quotation is selected at random from the database of quotations, selecting on the chosen categories. The quotations database is comprised of two tables - one which provides the text and source for the quotation, and the second various category descriptors (i.e. a given quotation can be assigned to any number of different categories). Figures 15a and 15b illustrate elements and structure of the database; figures 16a and 16b illustrate the similarly structured Proverbs database.
  8. Data Dictionary: Every database field element is described in the "data dictionary" database. While this database is used to help document the various databases and includes such information as "help message" (useful when a user is interacting with that field element of a database), it is also used to indicate how that field element should be presented in a HTML form (for the data maintenance portion of our system), how the field should be validated (against a regular expression), an indication of minimum and maximum length of the element, appropriate error messages for the validation routines, etc. While the end user does not interact directly with the data dictionary, it is a critical and integral component of the accompanying data maintenance system. Figures 17a and 17b illustrate the data dictionary system.
  9. Auth_User and Active_Sessions: These database accompany our use of the PHPLIB system for user authentication and session management. The data maintenance system (described in more detail below) utilizes the PHP language for handling data maintenance transactions for all tables described above. Authentication is handled by records added to the Auth_User table. This authentication permits or denies users to specific data maintenance applications. Basic session management information is maintained in the Active_Sessions database. The PHP LIBrary is a quite powerful sessions management system which should be explored by anyone needing to carry state information throughout a user's session. More information about PHP LIB can be found at http://www.php.net/.

Server Operation

As mentioned earlier in this paper, the core software which generates pages from the database is called the "Friends" server. While this program began as a CGI script that was called for every "Friends" page record, it has evolved into a full server to minimize resource requirements for page requests. The server is always running and listening on a certain TCP/IP port for connections. Upon receiving a connection, it generates HTML for the desired page and then releases the connection and continues listening for the next request. The following describes the operation of this server.

Upon launching the roughly 2,600 line Perl script, it completes a set of initialization procedures, establishing a series of Perl data structures, setting the value of certain key variables and pre-loading a series of Perl hashes with values to accommodate the random selection of news items, quotations and pictures. Upon completion of the initializations, the script enters the central processing loop where it simply waits for connections on the specified TCP/IP port and completes requested transactions.

Upon accepting a connection, the program accepts an incoming stream of environment variables from the connecting client. It fills an array with the environment variables which include the requested URL, the user's location, the user agent information (specifying such information as user platform and browser (if specified)), etc. Once it has received the environment variables, it has all of the information needed to generate the HTML.

The next step is to determine if the requested URL is a valid "PAGE" record in the "Friends" database. The "Friends" database includes a URL index so the look-up for this information is handled quickly. If the URL, or a reasonable variation of the URL, is not found in the database (there is a simple algorithm for determining "reasonable alternatives") it next checks to see if the requested URL is a page from one or two old versions of the "Friends and Partners" site. If it finds that the URL points to an old page, it returns a redirect command to the requesting client and closes the connection. If the URL is not found in the database, nor in the old F&P sites, it sets the Friends "PAGE" record to a "We're Sorry" page and proceeds with HTML compilation.

The next step is to determine the basic characteristics of the page requested -- such as language, browser class, character set encoding (if language is Russian) and "Personality" (governing certain stylistic attributes of the page).

A word about how state information is carried forward in the system is important here. Since there is no good way to maintain state information that works with all browsers, we encode state information in the URL. The following state object is based at the end of the requested URL on Friends pages.

(opt, mozilla, mac, russian, koi8, regal)

This string indicates that the user requests a page for a Netscape Enhanced browser (mozilla); is using a Macintosh as the platform (a former version of this program formatted pages differently for different platforms); the user has requested a Russian language version of the page; has requested the KOI8 encoding and is using the "Regal" personality (which specifies certain colors, fonts, page layout, etc.) When a page is requested, any "local" URLs in the content, the navigation lists, etc. have the current state information placed at the end of those URLs. In this way, state information is carried forward.

If the user visits a page for which there is no state information in the URL, reasonable values must be computed. The user's platform and browser are computed based on information in the user agent environment variable. The default language is chosen from a default for the "Friends Server" program (English for the server in the US, Russian for the server in Moscow). The encoding is similarly adopted from a preset default and the "personality" is likewise set in the browser program. The user can change these values by selecting different URLs on the given page.

The next step is to obtain values for the specified Friends "PAGE" record by issuing a SQL query. Of special interest are the attributes which give the "Style" to be used for page layout and the navigation list, buttons, trailers, headers, random quotations, random news items, etc., which are associated with that page record. Note that selection of a style is determined, not only by the style attribute in the Friends record but also by the browser class carried forward from a previous request or determined by default.

The "Style" record contains all of the information required for the general layout of the page including placeholders for all objects included on the page. Routines are called which, produce the text for each of these items and they are then substituted into the HTML included in the "Style" HTML. In this process, calls are also made to routines to produce text for randomized news items, quotations, and pictures. Finally, a call is made to the FPtext database with the friendsid and the requested language. If the language is not found, the program defaults to English. The HTML associated with the page can itself, have placeholders for which substitutions must be made. For example, the page may have a "LIST" request for which a list must computed and then substituted. The process of parsing the HTML associated with the page is described next.

First, the HTML is split into a series of text records (splitting on carriage return, which is the expected delimiter). If the placeholder marker (##) is found, then further instructions are executed on that line. Otherwise, it is simply added to the text being assembled. Assuming that the markers precede the word "LIST" followed optionally by spaces, followed by an equal sign, followed optionally by spaces, followed by a number (indicating a "LIST" record), the program calls a list assembly routine with the provided number as the parameter. This routine takes the provided number, locates the requested "LIST" records, determines the list attributes regarding the format and sort order of the list and the reads the "Elements" database to find all records in the "Friends" database that have been attached to the list. Every attached record is read. If the requested language for that record is not found, it defaults to the English version of that record. These records are sorted and then the HTML is generated based on the attribute defining whether this is a menu list, a numbered list, an ordered list and whether the list contains title and URL only or whether it also contains a text description of each item. This routine returns the HTML for the requested list, which is then added to the assembled text.

This process continues for all records that were included in the page text, thus assembling the text that is to be substituted for the "content" placeholder of the style record text.

Currently, the only constructs expected within the HTML are the LIST indicators and an indicator for random quotations but we are currently adding SQL calls (and format/sort attributes) so that direct SQL queries can be imbedded within page records.

Once the content has been substituted within the "Style" template, there is one remaining step. A call is made to the "Personalities" database to determine all attributes governing color, page elements, fonts, text size, link colors, etc. A single statement in Perl makes all the necessary substitutions in the assembled HTML so that the page is appropriately formatted with regards to the selected personality.

There is one last step remaining for Russian language page requests. All of the Russian language content on "Friends and Partners" is stored using the KOI8 encoding. There are many users, however, who utilize different Cyrillic encodings or who do not have Cyrillic fonts installed on their system and desire to read the Russian text in a transliterated format (in which Cyrillic characters are transliterated into one or more Latin characters). If a non-KOI8 encoding is requested, the entire assembled text is passed through a conversion routine that converts from the KOI8 to the requested encoding or transliterated text.

Obviously, assembling a page of HTML with the server is a resource intensive task. We maintain a timing log for all requests which shows that almost all requests which do not have to pass through the Russian encoding conversion, are satisfied in .12 - .18 seconds of a 40 MHz Sun SPARCstation 10 server. The Cyrillic conversion, however, requires 1.0 -- 1.5 seconds to complete.

 

Systems Evolution

While we in no way consider the design and implementation of the described system to be finished, we feel that it's evolution over the last five years and our own accompanying intensive experience with the associated information management has given us a fairly stable and well-proven design. The following provides a brief over-view of this evolution and its relevance to today's design.

The Friends & Partners project was launched in early 1994 as a web site and accompanying e-mail list servers designed to attract those interested in U.S./Russian cooperation and exchange and to provide services to facilitate their communications and information sharing. This initial web system was merely a collection of static HTML files. The early interest in the project and the many contributions of information by volunteers around the world quickly led to an increasing problem managing the text files with the embedded links. It was not only difficult in maintaining the integrity of those links, but there were also challenges in adding new material to the site and determining where new resources should be placed. We also had the problem as began to make the site bilingual.

At the beginning of the second year of our activities, we converted to a new model. Instead of using static HTML files we utilized our own SGML encoding which allowed us to include multiple language documents within single files and allowed us to add additional structural information to help us maintain a consistent design throughout our site. Thus, some of the SGML attributes included pointers to key graphic elements, pointers to files to be included on each page, and pointers to navigation lists associated with a given page. Associated with this new storage and organization model were CGI scripts to generate HTML from the underlying SGML base. This model resolved many of our concerns in ensuring site consistency and allowing us to change site design easily in separating discrete components of web pages into more easily managed objects and in dealing with multilingual content. However, we still had the problem with the numerous lists cataloguing resources of interest to the F&P community and embedded on the F&P pages. It was still difficult to determine how and where new material should be added to a site. During the third year of our effort, we began converting to a new model based on an underlying database.

In addition to wanting to resolve the problems remaining from our current system regarding information organization and content maintenance, we wished to begin moving to a model providing metadata about all objects on which we maintained information.

Because we did not have a very stable system definition at this time, we established the system on the FileMaker system (a very simple and easy to use relational database package that runs on the Macintosh and Windows platforms) on a Macintosh server. The FileMaker system allowed us to begin with what we understood about our system specifications and easily change them as we learned more. It was an excellent prototyping environment. The structure of the file maker system was in many ways similar to the system that has replaced it - the SQL database described in this paper.

Of course, the problem with the Filemaker system is that we did not wish to serve web information from a Macintosh platform (although technically it would have been possible). Thus, every day we had to convert the Filemaker Pro data to a CSV format, move it to our UNIX host, and then refresh the database on the web server. Because we did not at that time have knowledge of or experience with the various SQL database platforms and because we were interested in very fast response to user requests, we implemented this database using the Berkley DB database, which accompanies most UNIX operating systems. Using the balanced binary tree structure provided by Berkley DB, we were able to provide very fast access to the database material.

This system was unveiled during our fourth year of activity and has served us until the new system was finished. During the year that we used the system, we made many changes to the underlying database model and we also changed the method by which pages were generated - moving from static CGI Perl scripts to the Perl-based server which has proven much faster in satisfying user requests.

Work began in November 1998 on moving from this system to the SQL-based system. The primary motivation for transitioning to the new system is that we wanted changes in any of the database tables to be immediately reflected on the web site. As our database grew to include over 6,000 resources, it became increasingly time intensive to perform the daily updates and it was both difficult and time consuming to test out changes to the system.

The new system was completed in January of 1999 and everything transitioned in time for the five year anniversary of the first launching of Friends and Partners on January 19th.

Note about inclusion of Dublin Core Metadata Components: As mentioned in the previous section, we decided to adopt the Dublin Core as our own metadata model as it provides a very reasonable and less time consuming model for assembling metadata. We looked at more extensive metadata models and found them simply too "expensive" in terms of personnel resources, but also required a level of expertise that we did not feel capable of committing.

Data Maintenance System

Without question the most difficult component of the new system was the development of the data maintenance applications for the 24 tables comprising the database. To aid with the rather tedious task of designing the HTML forms and other myriad problems associated with data input validation and user authentication, we were very fortunate to find the PHP library which has good facilities for handling user authentication, data input forms, and data validation. We recommend that others involved in similar projects consider the PHP software package (for dynamic HTML page generation) and the PHP library for handling authentication and forms control.

Supplementing our use of PHP LIB (and included oohforms), we created a special table called the "data dictionary" in which we maintain information on all field elements and all tables associated with this database. In addition to providing descriptive information about each field (including its representation in the database) we provide information about how that element should be represented on an HTML form (whether a text box, a text area, a select list, a check box, radio buttons, etc.), a regular expression for determining appropriate input for this field, a minimum and maximum length for the field, a help message associated with the field, and error messages associated with invalid length and a failure to meet the regular expression requirements. The use of this database in conjunction with the oohforms class of the PHP LIB library has yielded a good solution to the tedious and time consuming parts of data input, form creation, and data input validation.

Each of the primary tables in the Friends system has one or more PHP "scripts" associated with its maintenance. All of these maintenance packages are included within one overall system which users find on our web site.

In addition to user authorization using my SQL authorization tables, we additionally authenticate users to use certain data maintenance applications using PHP LIB facilities for that purpose. When a user attempts to "visit" an application URL, the authentication system checks to see if a "cookie" is associated with the user's browser and if that cookie meets a certain time limitation. If so, the user is admitted; if not, the user is presented with a user name and password entry screen. Assuming the user provides a valid user name and password, the user's authentication level (user, author, editor, administrator) is checked against the authentication level required for that application. If everything passes, the user is then presented with instructions on how to proceed. Once the authentication is complete, a cookie is passed to the user's browser which is used on subsequent authorization checks. Every transaction results in the cookie being time stamped. Currently a one hour limit between transactions is enforced (i.e. after an hour since the last transaction the user is required to provide the user name and password again).

More information on the PHP LIB, oohforms class is available at http://www.php.net/.

System Management Features

The system provides a number of tools to help us better manage the database. Almost every data maintenance application provides an on-screen facility for performing common searches to help the user find requested records. The system also provides a full SQL search capability so that users can provide their own SQL based queries which yield a dynamically generated table listing the results of that query.

To help us in the data entry process in maintaining valid data, the system features a number of date fields in which we can indicate when a record expires, when it was last reviewed, when it should next be reviewed, and there are additional attributes by which we can assign individuals who are responsible for reviewing each record (i.e. different individuals can be assigned responsibility for any number of database resources).

Another very important feature of the overall management system is a status field which we use to "pipeline" entry processes. We can permit anyone on the web to add records to the database, but these records are marked as disabled (so that they do not appear on the site) and are also given a status indicating the need for review by an authorized user. Once a user reviews a record, they can assign any one of a number of different status levels indicating a problem with a record, question about its suitability for the site, or the need for a language translation to be provided. The use of the status information helps various members of our staff and a broader community of volunteers to share entry and maintenance responsibilities. Of course, when a record is complete it is assigned a status indicating such.

Finally, the system carries a number of additional attributes which allow staff to indicate that the record should be reviewed for deletion or, alternatively, marked as "disabled" which prevents that resource from being included anywhere on the site. This latter mechanism allows us to quickly "turn off" records which need to be reviewed for some reason.

Planned Enhancements

One of the chief advantages of maintaining all web information in a database is that we can automate much of the maintenance of that information. The following represent two primary enhancements which are being implemented in the near future.

  1. Assignment of link and page records to individuals for distributed system maintenance: One of our first improvements to the current system is to create a list mechanism for indicating ownership of records and groups of records by individuals. Our current authorization scheme allows us to restrict authorization to operations (i.e. select, insert, delete, update) to specific tables and even to specific fields within specific tables (this is all using standard mySQL authorization tables), but there are many cases in which we would like to assign ownership of complete records enabling individuals to perform specified operations on those records, but not necessarily on other records in the same database tables. This will allow us, for example, to assign an entire portion of a given web service to an individual. They can maintain all information associated with that service, but cannot perform the same types of operations necessarily on other records in the database.
  2. Automated link checking system: This application will be a fairly simple Perl script which will be run regularly (probably as a cron process) to check the integrity of all page and link records in the central Friends database. Upon receiving "404" (page not found) error messages from web servers, this system will immediately disable the record (by turning on the disabled flag), will change the status appropriately, and will notify the individual who has been assigned responsibility for that record within the Friends & Partners community. Upon receiving a valid re-direct message from the web server, the system will change the URL, update the title appropriately, set the status to an appropriate value, and notify the F&P person of the changed location. Other error conditions such as time outs will have to be dealt with differently.

[English] [Russian TRANS | KOI8 | ALT | WIN | MAC | ISO5]

Domashnyaya stranitsazh ° Kommentarii ° Kniga gostej


©1996 "Druz'ya i Partnery"
Natasha Bulashova,Greg Koul
Updated: 1999-04-

Please write to us with your comments and suggestions.

F&P Quick Search
Osnovnye razdely
Domashnyaya stranitsa
Bulletin Board
Besedka
Listserver Druz'ya i Partnery

O Druz'yah i Partnerah
Ob'yavleniya
Nagrady/Premii
Gazeta ISOC '95
Nasha istoriya
Nash server
Nashi sponsory
Kniga gostej

Avtory
Nataliya Bulashova
Greg Koul

Kollektiv
Tat'yana Stepanova
Evgenij Mitkovskij
Tat'yana Provorova
Ivan Revyakin
ZHenya Kozlova
Lera Gonchukova

©1996 Friends and Partners
Please write to us with any comments, questions or suggestions -- Natasha Bulashova, Greg Cole