November 1999 - NT Internet Goodies

Last time, I featured free TCP/IP utilities to enhance your browsing experience. This time, I’ve got a special treat in store—Web databases! Namely, what databases are free (or almost free) and provide the power you’ll need either for simple business Web sites or for your garage Web site.

The Source of our Confusion

I started my investigations with a predilection for only two free databases: miniSQL and mySQL (although mySQL isn’t always free). I’ve actively used both of these databases in Web development, as well as the standard ones (Oracle, Sybase, Microsoft SQL Server, etc.). My primary source for downloading the free databases we’re going to review here is ftp://ftp.idiom.com/pub/free-databases. It’s a text document that features a simple listing of many free databases and their features.

One thing I quickly discovered is the amount of time necessary for properly exploring database. In the interests of research depth, I’ll approach each database in some detail—so look on this article as the first in a series.

A Little Background…

Databases come in many flavors. You can get relational databases, database flat files, or object-oriented databases. For Web development, the most logical choices are either relational databases (Oracle is a relational database) and flat files (such as when you link a database access library into your own application). I’ve concentrated on these two types.

Bearing the above in mind, it’s important to note that most databases support languages other than just C—Java, Perl, TCL, and PHP come to mind. However, since time constraints prevent me from writing test programs in multiple languages, and since C and Java are (usually) the lowest common denominator for interfacing to a database, I’ve focused on databases that allow either a C or a Java interface.

The easiest Web development database runs on as a separate process on a single machine. Multiple applications can interact with the database server simultaneously by establishing a connection using TCP/IP. This type of database requires the least amount of programming to store and retrieve data from the database server, since your application programs (whether they are C programs or Perl scripts or Java applications) simply connect to the database server and start sending SQL statements across and receiving result sets.

On the other hand, flat file databases require you to write a program to interface to the database routines. The main attraction of these types of databases is simplicity and speed—they usually offer only a subset of SQL (at best), and are optimized for fast data access. Generally, this type of application works well when your database client applications send a small set of known commands to your database server application, rather than sending an arbitrarily complex SQL statement.

The Databases

If you go to the free database FTP site I mentioned above, you’ll see a huge number of entries. Actually, I found it tough to get appropriate databases—most were either completely non-Windows, or the reviews used red-flag phrases such as “unstable,” “wouldn’t put my records in it,” and so on. (I figure that even though we want something for free, we also want something that works!)

mSQL (pronounced “mini-sequel”)

Hughes Technology Ltd (http://www.hughes.com.au/, free for personal use) provides this popular free database to the online community. The database features a stand-alone database server, a workable subset of ANSI SQL (including JOIN support), an easy setup, and a relatively easy application interface.

Getting the software. It’s a bit of a turn-off, but Windows platforms aren’t directly supported by mSQL. Instead, the Links page provides a couple of pages where you can get Windows versions of the database. My personal preference (since it’s semi-supported by Hughes Technology) is http://shell.warped.com/~joshua/, where I downloaded the application. The latest current version at the time of this writing is http://shell.warped.com/~joshua/PCMsql/mSQL-2041/index.html, and I downloaded the full install file. Even though I already have a copy of the mSQL database server running on my system, I’ll focus on the latest and greatest version available.

Setting up the system. The installation process is relatively straightforward, if not precisely accurate. You simply execute the downloaded file, and go through a normal install process. Unfortunately, I received a general protection fault from the install program during this process. The good news was that it didn’t matter—all the program files were copied successfully into my target directory.

The situation got a little more confusing when I scanned through the readme.txt file in the installation target directory. This readme file references an older, manual installation process (downloading and unzip’ing a file). The readme file also indicates that using a directory structure other than c:\usr\local\hughes may cause problems when running the database server. This is not the case—you can safely install the database anywhere you like.

Another problem is that the convenience command script “msqlvars.bat” provided with the download isn’t quite correct. This script sets the system path to your installation directory. However, the mSQL binaries are located in the “bin” directory underneath your installation directory. To correct the problem, simply open the msqlvars.bat file in any editor and append “\bin” to the PATH statement.

Running the database server.

1. Open a DOS box.

2. Change directory to your installation directory (d:\msql\msql-2.0.4.1-joshua, in my case).

3. Run the “msqlvars.bat” command script. This sets the system path necessary to run the database server binaries (assuming you made the correction listed above).

4. Type:

msql2d -f ./msql.conf

(The “-f” option tells mSQL where to find the configuration file.) You should see a startup copyright banner, and a few messages ending with “Server mode is Read/Write”.

Administering the database server. mSQL provides these tools (all DOS command-line based) to aid you in database server management:

msqladmin – Create a new database inside the server, drop an existing database, copy information from one database to another, and so on.

msqldump – Extract an ASCII dump of a database.

msqlimport/msqlexport – Extract data from a database in a format suitable for feeding into another mSQL database server, or for feeding into a spreadsheet.

relshow – View a database or table schema.

Lite – A simple programming language that (may) eliminate the need to do your database programming using C. We’ll cover this feature in more detail.

msql - Allows you to execute free form SQL statements, as well as allowing you to feed in multiple SQL statements from an input file. Similar to SQL*Plus (Oracle) or iSQL (Sybase).

Creating tables. mSQL supports only a subset of ANSI SQL, so it’s not surprising to find that the CREATE TABLE command is a little different from what you might expect. First, mSQL doesn’t use the familiar VARCHAR data type (which might have an impact on any existing SQL scripts you have). Instead, mSQL uses CHAR (up to 255 characters) and TEXT (theoretically unlimited, actual limitation is about 32K depending on platform). Second, mSQL requires you to issue separate commands to create sequences (see sidebar) and indexes.

Programming the Database. One good feature about mSQL is the scripting language “Lite.” You can use this language to avoid coding directly in either C or Java. I’ve never actually coded in Lite, so I looked on this as a perfect opportunity to exercise its ability to simplify life for the Web developer.

Note that Lite’s a C-based language. That means you should have a working knowledge of C to get best use from the language. And the User’s Guide doesn’t cover every aspect of the language. (For example, you can use the undocumented “exit” function to leave a program prior to the end of the script.) When writing scripts, remember that you should try accessing a C-language function before assuming that Lite doesn’t support that function! It’s perfectly safe to do this, since you’ll get an error (when you run lite) if you try to use a function that doesn’t exist.

<BEGIN CAUTION>

Lite is not a server-based language—you can’t create stored procedures using Lite! Remember that all your Lite scripts run as separate processes, each attaching to the central database server as necessary.

<END CAUTION>

At first glance, Lite doesn’t appear to fill the need for any serious development. It’s interpreted (s-l-o-o-w), and doesn’t support persistence (which means that any Web page that loads/executes a Lite script doesn’t know about variables from any other Lite script). However, these problems aren’t quite as bad as it might appear. First, you can create pre-compiled libraries that your Lite scripts reference. This can dramatically improve system performance while also easing your programming requirements. As far as persistence goes, since Lite is a database programming language, Hughes assumes that any shared variables/data will exist in tables in the database server. Also, since the language provides a usable set of file manipulation routines, you can store cross-script information in local data files.

Finally, the best reason to use Lite is its ability to run unchanged on every platform that Hughes supports (and there are many of these platforms). This means that properly-designed Web development (that addresses vendor HTML differences) can reference mSQL program scripts on any language that supports an mSQL database.

Performance Testing Lite. I’ve always found that the best way to verify something is to try it. In this spirit, I’ve created a very simple benchmark program that compares two identical functions. One function is in a library (pre-compiled), the other function is not pre-compiled. Here’s the pseudocode:

i <= limit

while i > 0 do begin

j <= i + ( i / 2.0 ) / ( i + 3.0 )

k <= j / i – 73.0

l <= k / 1000.0

i <= i - 1

end

This loop tests the arithmetic and floating-point ability of Lite. I ran the loop with a limit of 100,000, and suspected I’d get identical results. As I suspected, both the pre-compiled and the library form of the functions executed in the same time (25 seconds on a 266mHZ NT box with 128MB of memory). This points out that Lite’s not a strictly interpreted language that can change its logic at run-time (like GW-BASIC, for example). Rather, Lite compiles each function once. So, the big benefit of libraries is to minimize initial load time of the script rather than to get a performance boost.

Just for kicks, I wrote the same benchmark using Java and C++. For both benchmarks, I changed the limit to 1,000,000 (one million). The Java application (without benefit of the JIT compiler) ran in ten milliseconds. And (of course) the C++ program ran in sub-second time as well. (The moral of this story: if you want performance, stick with a compiled language!) However, since (in many cases) the biggest share work done in a database application is on the server, the performance degradation from using a relatively inefficient scripting language to enable access to the server becomes statistically less significant.

<BEGIN NOTE>

Different implementations of mSQL may run faster or slower. The Win32 edition I’m using in this article has a note mentioning its slower performance. According to the porting author, you can get better performance by downloading the Cygwin build of Win32 mSQL, rather than the Visual C++ 5.0 build.

<END NOTE>

You can get all these benchmark examples from <EDITOR: Insert FTP address here.>.

Setting up the mSQL Database. It’s a simple matter to setup your mSQL database. To create a database named ANDY, you must use msqladmin, as in the following:

msqladmin –f ./msql.conf create ANDY

<BEGIN TIP>

Keep in mind you can create as many separate databases as you like within a single mSQL database server.

<END TIP>

The above assumes that msql.conf exists in the current directory. IOnce complete, we create the table. While I could use the msql tool to do this, I created a simple Lite script called CreateTable.lite (on the FTP site). For reference, the code is much like this:

$conn = msqlConnect( “<host>” ) ;

if( $conn < 0 ) ...

$rc = msqlSelectDB( “<db_name>” ) ;

if( $rc < 0 ) ...

$rc = msqlQuery( $conn, “CREATE TABLE Users( Name char(100), <etc> “ ) ;

Now we have the database and the table, all we have to do is insert values. This is really easy, since it uses standard SQL statements. Just start the msql tool with:

msql –f ./msql.conf ANDY

You get a welcome message, followed by the “mSQL >” prompt. At the prompt, type in the INSERT statement. As an example:

INSERT INTO Users VALUES( ‘andy’, ‘1234 Strawberry La’, ‘Alex, VA’ )\g

Note the “\g” at the end. That tells the msql tool to execute the SQL statement. You can batch up multiple SQL statements and submit them all at once by strategic use of “\g”.

Performance Testing the Database. I’ve written a very simple database-performance test you can use to benchmark your database implementation. The suite depends on successful creation of the ANDY database and the Users table, and is contained in the Lite script DatabaseBenchmark.lite on the FTP site. Basically, it inserts 1000 rows into the table, and then does a SELECT for each row. My results are as follows:

INSERT – 7 seconds (initial time); 12 seconds (second time with failures)

SELECT – 11 seconds

The performance could definitely be better, but we could also improve our throughput by using an index on the Users table. Also, running the same test on a different platform (say, Solaris) can dramatically improve overall throughput.

Web Development with mSQL

Hughes provides two distinct ways of accessing the mSQL database from within a Web page. The older and more simplistic method allows you to reference the database interface from within an HTML document. This suffered from numerous drawbacks, not the least of which was controlling HTML output from a CGI script. Hughes’ second generation of Web development provides a program called “W3-mSQL,” which is a WWW interface package. Basically, it allows you feed input HTML documents containing embedded Lite programming commands through the w3-msql program. Within the HTML document, you can control the final result (the output HTML sent to the client) by making programming decisions. Thus, the w3-msql program works much the same way as any other CGI filter (such as perl, or PHP3, or even VBScript).

On the down side, I couldn’t get w3-msql to work on my Windows NT box. However, there’s nothing to prevent you from using msql calls (or even invoking Lite scripts) within your Web scripting language (such as Perl or TCL). Plus, I’ll do some more research on setting up w3-msql with Microsoft’s Personal Web Server for next time.

Coming up next time…

Coming up next time, I’ll provide the results of my research for running w3-msql CGI scripts from my Windows box. Plus, we’ll cover the mySQL (pronounced “my-ess-q-ell”) database to compare it with mSQL. Finally, I’ll scan the hottest Web sites to give you some free goodies on privacy and security. This security issue is now near and dear to my heart: my good friend Joe Berry was recently the victim of a Web attack, and lost lots of time restoring his system. So, providing a good look at various security applications seems nothing if not a propos.

FYI: You might have noticed a distinct pattern in the way I approach each months column. I’m alternating between pure programming tools (as in this issue) and general purpose internet tools (as in last issue). I have a question for you: Does this model work? Do you get the information you need? Should I concentrate more on one or the other (or neither)? Drop me a line at abruce@savant-corp.com letting me know—your feedback counts. And until then, surf safely!

<BEGIN SIDEBAR>

Sequences 101

As you create tables inside your database server, you may find you want a quick and efficient way to indicate relationships between different tables. As an example, consider Figure 1. In this diagram, we have two tables. For each entry in the Host table, we can have zero or more entries in the Application table. The obvious problem here is that the HostName field is repeated for each application. This has two drawbacks: first, we burn a lot of space this way (imagine a host machine with hundreds of applications on it). Second, we make it harder to change the name of a host machine—this has a direct on the Application table!

As an alternative, it’s better to create a simple integer field called a sequence or an identityI field on the Host table, and then use this sequence field in the Application table. The sequence field is simply a monotonically increasing number. Thus, our revised table structure looks like Figure 2.

Note that the Application table has become much smaller. Plus, changing the name of a host machine has no effect on the Application table.

There are a couple of ways to handle sequence number allocation. The less-desirable way is for an application to create a special “sequence table” (or tables) containing only a single integer field. Then, when adding a new record that requires a sequence number, the application SELECTs the current value from the sequence table, adds one to it, and then update the sequence table. A major flaw in this scheme occurs if two or more separate applications both need to get and update a sequence number simultaneously. Since there are two separate transactions here (one to SELECT the current value, and another to UPDATE the value), it’s quite possible for the first application to get the value, do some processing, and then UPDATE the current value after the second application has done the same thing. Net result: a “race” condition where the current value in the sequence table ultimately has the wrong value in it—which destroys the uniqueness of the next value used from the sequence table.

The far-better solution is where the database server itself provides a method to get and increment the current sequence value in a single step. All databases I’ve worked with (including mSQL and MySQL) provide this functionality. The down side is that the functionality is definitely vendor-specific!

<END SIDEBAR>