SUMMARY

The market economy of Russia is very heterogeneous, both in terms of business attractiveness of various commodity categories, and the degree of entrepreneurial activity in various regions of the country. The information system proposed below allows to identify the commodity groups most demanded on the Russian market, as well as to determine the regional differences in the intensity of business activity. It makes possible trend analysis (starting in February 1995) and tracking other developments in the entrepreneurial activities in the Russian marketplace.

The source of information in this system is represented by the daily data on a few indicators that characterize the informational activity of businesses in the Russian market. The analysis covers that portion of the information activity which takes the form of messages sent by entrepreneurs through the commercial newsgroups of the Russian national computer network RELCOM. The indicators are based on the estimation of the intensity of the flows of commercial messages within individual commodity categories and across major economic regions whose entrepreneurs are active in the Russian market. For each indicator, the information system allows to obtain (within a given time range): 1)ratings of commodity groups or economic regions: 2)graphical representation of the dynamics of the indicators. The initial data is updated once every month and available for outside users (in the ASCII format). When creating ratings and time graphs of the changes in the indicators, the system users can change the time range and select the list of commodity groups and economic regions to appear on the graphs.

Official indicators of business activity are not well developed in Russia. We offer several indirect indicators which can be used to understand the type and the extent of commercial activity, as well as the overall state of the market. One source of such data is the Russian computer network RELCOM which, given the generally poor quality of communications available in Russia, is an indispensable source of information for the majority of Russian business people.

Business messages communicated through the RELCOM network are routine advertised offers to sell or buy a specific good or service. On average, the price of communicating such a message is considerably less than $1.00. In addition, a customer can send his message even if he does not have a computer and the access to the network - through firms providing such services. The average cost of the subscription to the network is $20.00. The average cost of the equipment necessary to hook up to the network does not exceed $600 - $700. The nation-wide network RELCOM covers the entire territory of Russia, and is accessible by all the Russian-speaking INTERNET users. In terms of the accumulated user base, RELCOM is the unquestionable leader compared to the other similar Russian computer networks. In the first quarter of 1995 alone, more than 140,000 "commercial propositions" were communicated over the network. The user addresses are structured to include the name of the region(s) where the users are located. So the biggest advantage of using this source of data is its regional content which provides a reasonably accurate profile of local business activities.

The RELCOM network is one of a few nation-wide information technologies available in Russia (in addition to the two national television channels and several newspapers and magazines) that form a unified Russian marketplace on the vast territory of the Euroasian continent (for instance, the time difference between the west- and east-most time zones is equal to 12 hours). It is collecting cross-regional (12 Russian regions and 14 ex-Soviet republics) and cross-sectional ( 21 commodity groups) sets of commercial propositions (on the daily basis since February 1995).



USER'S GUIDE

The system is organized in three tiers (levels). The first (master) "page" of the system allows to choose the time range for the indicator analysis (see the Section "Time Range"), to select the type of the indicator to be analyzed, and, by pressing the button Submit Query, to activate the next (second) "page" of the system. In addition, within the first "page", a user can obtain commentaries to the system and the initial data that can be used outside the system (for the restrictions on use of this data, see the Section Copyright).


The time range

is specified as YYMM-YYMM or YYMM, for the only month. YY are two last digits of year (95, for example) and MM -- month's number, 01--12.

Statistics is gathering and processing since February 1995, so the least left margin may be set to 9502.

The default value is all the period of statistics processing.


Having selected one of index types and pressed Submit button, you'll get the page of relevant subtypes and their rating. To get the graph you'll have to check one or more subtypes and submit the query on the second page.


The first four out of six types of indicators shown on the first page can be selected in one of the two options: 1)absolute value of the indicator (amount), or 2)daily rate of growth of the indicator (index). This selection affects the graphical representation of the indicators. In the first case, the graphs of absolute values are constructed, and in the second - the graphs of growth rates are built.


The indicator

Advertisements in group

contains the quantity of communications and their change in time within separate commodity groups. The structure of the commodity groups is almost identical to the structure of the commercial newsgroup sections that currently exists in the RELCOM network. Individual communications are assigned to the appropriate commodity groups on the basis of the indication of the destination newsgroup contained in the given communication. If a communication appears in several newsgroups, only the first reference to the destination newsgroup is used.

The list of commodity groups:

        audio-video
        chemicals
        computers
        construction
        consume
        energy
        real estate
        food
        food.drinks
        food.sweet
        householding
        info services
        machinery
        medicine
        metals
        money
        orgtech
        software
        stocks
        tobacco
        transport

Individual values of this indicator for a given commodity group correspond to the daily numbers of communications directed to this newsgroup. The index of these values then yields the daily growth rate.


The indicator

Advertisements in region

measures the quantity of communications and their change in time computed for individual regions where these communications originated. The regional structure is based on the established division of the territory of Russia into 10 major economic regions, separate regions of Moscow and St.-Petersburg, republics of the former USSR, and the foreign sector.

List of regions:

Russian's regions

        Moscow
        St.Petersburg
        Ural
        Center
        West Siberia
        Volga
        Volga-Vyatka
        East Siberia
        Far East
        North Caucaus
        North-West
        Black Earth Zone

Ex-USSR regions

        Armenia
        Azerbaijan
        Belorussia
        Georgia
        Kazakhstan
        Kirgizia
        Latvia
        Litva
        Moldavia
        Tajikistan
        Turkmenia
        Uzbekistan
        Ukrain
        Estonia

The assignment of individual communications to appropriate regions is based on the name of the second-level domain in the address of the sender. For example, messages with return addresses containing *.msk.su, *.msu.ru will be referred to the region of Moscow. Communications with undetermined regional identity are grouped into the category UNKNOWN. Moreover, there are two groups of communications with the first -level domain names "org" and "com" respectively in their return addresses which usually originate in foreign countries other than the former Soviet republics. To simplify the analysis, the regions of Russia with low information activity are grouped under the heading "rest of Russia"; the similar non-Russian regions form the group called "rest of world". Individual values of this indicator for a given economic region correspond to the daily numbers of communications originated in the given region. The index of these values measures the daily growth rate.


The indicator

Active hosts in group

measures the number of commercial organizations (or computers) which produce messages to the commercial newsgroups of the RELCOM network during the 24-hour period. This indicator is computed for each individual group. The index of the indicator's values reflects the daily growth rate.

The indicator

Active hosts in region

measures the number of commercial organizations (or computers) which produce messages to the commercial newsgroups of the RELCOM network during the 24-hour period (one should bear in mind that, technically, the same computer address may be used by several computer sites in a network). This indicator is computed for each 24-hour period and each individual region. The index of the indicator's values reflects the daily growth rate.

The indicator

Sum of active hosts in region

measures the total number of commercial entities that have sent at least one message to the commercial newsgroups during the period starting from February 1, 1995. Changes in the value of this indicator over time allow to estimate the rate of increase in the number of entrepreneurs who use commercial newsgroups to disseminate their propositions (it is necessary to take into account the fact that changes in the indicator during the first months after February, 1995, reflected the accumulation of required information, therefore, this period must be excluded from the analysis). The indicator is computed on a daily basis, and for each region. The index of the indicator's values is equal to the daily growth rate.

The indicator

Advertisements per host in group

measures the mean quantity of communications sent by a commercial organization. This indicator allows to evaluate shifts in the information activity of individual entrepreneurs (it must be remembered that several entrepreneurs can use one computer to communicate their messages). The indicator is computed on a daily and weekly basis, and for each group. The index of the values of the indicator is not computed.

The indicator

Advertisements per host in region

measures the mean quantity of communications sent by a commercial organization. This indicator allows to evaluate shifts in the information activity of individual entrepreneurs (it must be remembered that several entrepreneurs can use one computer to communicate their messages). The indicator is computed on a daily basis, and for each region. The index of the values of the indicator is not computed.

The indicator

Daily total of advertisements

measures the total quantity of communications sent by all organizations. The indicator is computed on a daily and weekly basis. The index of the values of the indicator is not computed.

The indicator

Avg. traveling time (days)

measures the average time that it takes for a message to get from the place of origin to the registration point (see Section Technical Guide). Changes in this indicator characterize the overall quality of the message transmission by the network.. The indicator is calculated on the daily basis. The index of the values of the indicator is not computed.

The indicator

Group hosts' activity distribution

presents the distribution, on the number of active days, of hosts, which attended the specified group during a month. On the plot, Y(X) is a share of hosts, which were "active" 1...X days. On the second page, the first numeric column is the average of active days, the second numeric column is the average divergence.

This indicator is being evaluated only for one-month time ranges; the query with wider time range results in averaging the indicators of the monthly sub-ranges.

Both daily/weekly and volume/index modes are not applicable to this indicator and are ignored.

The indicator

Regional hosts' activity distribution

presents the distribution, on the number of active days, of hosts from the specified region, which were active during a month. On the plot, Y(X) is a share of hosts, which were "active" 1...X days. On the second page, the first numeric column is the average of active days, the second numeric column is the average divergence.

This indicator is being evaluated only for one-month time ranges; the query with wider time range results in averaging the indicators of the monthly sub-ranges.

Both daily/weekly and volume/index modes are not applicable to this indicator and are ignored.

The indicator

Regional hosts' commodity heterogenity

presents the distribution, on the number of attended commodity groups, of hosts from the specified region, which were active during a month. On the plot, Y(X) is a share of hosts, which attended 1...X groups. On the second page, the first numeric column is the average of number of groups, the second numeric column is the average divergence.

This indicator is being evaluated only for one-month time ranges; the query with wider time range results in averaging the indicators of the monthly sub-ranges.

Both daily/weekly and volume/index modes are not applicable to this indicator and are ignored.

The indicator

Group hosts' commodity heterogenity

presents the distribution, on the number of attended commodity groups, of hosts, which attended the specified group during a month. On the plot, Y(X) is a share of hosts, which attended 1...X groups. On the second page, the first numeric column is the average of number of groups, the second numeric column is the average divergence.

This indicator is being evaluated only for one-month time ranges; the query with wider time range results in averaging the indicators of the monthly sub-ranges.

Both daily/weekly and volume/index modes are not applicable to this indicator and are ignored.

USER GUIDE TO SECOND PAGE

Executing Submit Query on the first page opens the second page of the system on the screen. The second page displays the table containing the following attributes of the indicator selected by the user:

Column NAME

- the list of commodity groups or regions (depending on the type of the indicator chosen); for the indicator Avg. traveling time, the second page presents its graph;

- in the list, individual groups or regions are placed in the descending order in accordance with their ratings (i.e. commodity groups or regions in the top portion of the list have higher ratings than those placed below them).


Column #ADS

- for each commodity group or region, the value of the parameter used to calculate the rating;

- this parameter is taken to be equal to the total number of communications within each individual commodity group or region during the time range chosen by the user (for changes in the time range chosen, see below).


Column LMI

-- Last Month's Index

- for each commodity group or region, rate of growth in the indicator selected on the first page during the last month, calculated as (A(t)/A(t-1) - 1)/100%, where A is the total number of communications in a given month, t - the last month of the selected time range, t-1 - the month immediately preceding the last month in the chosen time range.


For some types of data, monthly rates of growth (column "ind") are not determined and, therefore, are absent on the second page (indicators 4 - 6 in the list on the first page of the system).

The second page allows: 1) to rearrange the list of commodity groups or regions for the time range entered by the user; 2) to view the graphs of changes in a given indicator for desired commodity groups or regions.

To reevaluate the ratings of commodity groups or regions, one must adjust appropriately the time range found on the second page and press the button Submit Query. While doing this, no commodity groups or regions may be highlighted.

If at least one commodity group or region is highlighted, pressing Submit Query opens the third page of the system that allows to view the time graphs of the highlighted entries (for the selected time range). If necessary, the time range for the graphs can be changed directly from the second page (to do this, the user should enter the desired new value for the time range in the window "time range").

The third page only displays graphs and does not contain any commands. Return to the previous page is accomplished by the standard methods of the Internet browser (for example, in Netscape, it is done by pressing the button Back or selecting the appropriate entry in the GO menu).



This is a preliminary draft only!

The data collection

A daily routine scans the new usenet news in the relcom.commerce hierarchy. The result is a text file containing items consisting (roughly) of the fields `Date:', `From:' and `Newsgroups:'. The approximate size of the file is 200 Kb.

This job is being done at the news server of Infoteka Ltd, Novosibirsk, Russia (news.itfs.nsk.su).

Monthly routine (runs on the 7-th of next month):

    - `inverts' the raw data files to produce the files containing info
      about articles (advertisments) issued on the same day;

- factorizes the inverted data on the region basis

- smooths some of the factorized data on the time parameter: the number of ads. per day is set equal to the average during the current and 6 preceeding days.

- archives processed raw data files.

The resulting are 5 files:

- # of ads. in each group per day (#groups columns, #days rows)

- # of ads. from each region per day (#regions columns, #days rows)

- monthly average of ads. in each group from each region (#regions columns, #groups rows); this file is not used in plotting

- # of active hosts in each region per day (#regions columns, #days rows)

- # of active hosts in each region since the beginning of statistics (#regions columns, #days rows); this data is not smoothed.

The visualization routune is implemented as a SHELL script which does the following:

- builds command file and appropriate source data file for `gnuplot' based on the request from www browser and files described above

- runs `gnuplot' and `ppmtogif' -- the convertor of `gnuplot' output to GIF file.

Having executed the script, http daemon sends a link to the GIF file back to the www browser.

GIFs aged >= 24 hours are deleted by `cron' command every midnight.



GRATITUDES

This Web page was launched with support of INFOTEKA (RELCOM regional node in Novosibirsk, Russia), Institute of Economics and Industrial Engineering (Novosibirsk), Russian Fund for Humanities Research, Fulbright Program and Krannert School of Management at Purdue University


COPYRIGHT NOTICE

This Web page as information system is Copyright 1996 by Sergei Parinov and Victor Lyapunov. Any part of this information system may be freely used for any purpose. If it published or distributed in part, it must include this copyright notice. It may not be sold, or placed in something for sale, without the permission of the authors.