Databases can be so useful, nowadays even small organizations use them. Everywhere you look, there is a database: mailing lists, customer profiles, organizing a wedding, routing toll free telephone calls all over the country… Some are small and simple. Some are immense and complex.
Each database runs in a database management system (DBMS). If you have dealt with computers long enough, you remember some of the earliest desktop DBMSs. Most of the early desktop DBMSs were shoved aside long ago by Microsoft Access(TM), although several are still available such as Paradox(TM), dBase(TM) and Alpha Five(TM). For larger databases, you are probably aware of MySQL(TM), IBM DB2(TM), Oracle(TM), Microsoft SQL Server(TM).
In my consulting practice, I often encounter people who think those are the only DBMSs worth considering, and they are all just different flavors of the same thing. For more than a decade, I have been helping them past their confusion so they can choose a DBMS intelligently before they build a database. This short primer will give you a basic grounding. You may still need help to choose the right DBMS for your purposes, but at least you will be able to ask smart questions and get the answers you need.
Low-Priced Desktop Databases
Your dictionary may say a spreadsheet qualifies as a database. In the computing world, a spreadsheet does not even come close. Through special interfaces, you can use a spreadsheet as a handy, familiar way to look at a portion of the data in a database. But only one person at a time can open and make data changes in a spreadsheet.
If the data needs to be usable for a group of people and it is in a spreadsheet, eventually somebody will get frustrated about not being able to make changes. They will make a private copy of the file, put their changes into it, and replace the group spreadsheet with their version whenever they have a chance. That wipes out any changes made by someone else while the frustrated person was working in a private version. Valuable information is lost this way.
If more than one person needs to work with information, it does not belong in a spreadsheet. It belongs in a database. Even the cheapest, least capable DBMS will allow everyone to look at the data at the same time. When someone makes a change, only that small portion of the data-the equivalent of a row on a spreadsheet-is “locked” during the change to make sure two people do not make different changes to it at the same time. As soon as the change is finished, that “row” is “unlocked” and made available for other changes by anyone in the team.
However, the adage “you get what you pay for” is frequently just as true in computing as it is anywhere else. Desktop DBMSs do enough to satisfy modest goals. You may be able to start out with one of these. As your database grows larger or your needs become more sophisticated, expect to need to migrate to a more substantial DBMS.
As an example, let’s say you run a small business with 5 employees. You have a few hundred customers. Their orders, your order fulfillment, and your administrative procedures are straightforward. A database in Microsoft Access can handle that with no problem.
Now let’s say your business grows. By the time you hire employee number 50, you will probably either be migrating to another DBMS, or already be using another DBMS. Most desktop DBMSs in the low end of the price range are designed to work well with a few simultaneous users. They struggle as both the number of users and the amount of data grow. You might buy yourself a little time by putting your database on a more powerful computer, but ultimately a serious database needs a serious engine, just like a racecar needs a more serious engine than a commuter’s car.
MySQL as a Step Up
MySQL has become a popular next step up. It is often available free with website hosting services. It runs on several flavors of UNIX as well as on Windows(TM). Many free or cheap web applications (and some more costly ones) run on it. It can handle more data and more users with better, more tunable security than typical desktop engines.
You can do a lot with this, handling a substantial amount of data and many more simultaneous users than a desktop database can handle.
However, the adage about what you pay for will bite you again when you reach the next threshold. Exactly which threshold you reach is not the same for every database. You may need better security or disaster redundancy because your data is sensitive. You may need faster performance. You may need to run on an operating system that is not UNIX or Windows. You may be dissatisfied with the tools available to help you work with your database. Any of these is a good reason not to choose MySQL.
For most people, the best known alternatives are DB2, SQL Server and Oracle. They are designed to be able to cope with large, sophisticated, enterprise class databases and loads.
Microsoft SQL Server
SQL Server is often mistakenly called SQL. SQL stands for Structured Query Language. This is a scripting language that can be used to pull data from and put data into a database. Each SQL-compatible DBMS has its own dialect of the SQL language. SQL Server is a DBMS, and its dialect of SQL is called Transact-SQL or T-SQL. The database administration toolset for SQL Server is heavily oriented toward graphical usage instead of scripts.
As a Microsoft(TM) product, SQL Server is only at home on Windows. Each version is released with several editions ranging from small (oriented toward the desktop and developer) to enterprise (very large and complex with heavy usage). The bigger the edition, the greater the capabilities and the bigger the price. This is often the next step up from Microsoft Access because it runs on the same operating system, and because it is easy to use Access as just an interface between the user and the new database. By using Access to present the user with screens for SQL Server data, a business can migrate upward without needing to retrain database users. The users still get the same look and feel they had in the small Access database.
This is the right solution for some databases, but not for others. Some of my clients run large databases on operating systems other than Windows or UNIX. Some want to move their database to another operating system in the future without needing to change to a new database engine at the same time. Some find that T-SQL does not offer all the scripting capabilities they need for the applications that must run on their databases. Each of those is a viable reason to choose Oracle.
Oracle and SQL Server constantly compete for benchmark test results that outdo each other for performance. Either can be fast, manage large amounts of complex data, and handle substantial usage loads. But Oracle runs on a wide variety of operating systems. If you want the flexibility to move to another operating system, this is obviously desirable.
Oracle is especially entwined with the OpenVMS(TM) operating system to deliver as much performance as possible on that platform. If you cannot tolerate frequent reboots for security patches and turn pale at the very thought of a system crash, you know it is common for OpenVMS to run for years between reboots. You also know OpenVMS can make multiple computers behave as though they are a single computer when they are up to 500 miles apart, offering great protection against disasters. A setup like that is certainly not cheap, but it is easy to see why some organizations demand it.
Work that requires a separate compiled software program in SQL Server can sometimes be done with mere scripting in Oracle. Oracle’s dialect of SQL is called PL/SQL, and has capabilities T-SQL lacks.
Some organizations prefer mainframes instead of Windows or UNIX servers. For these, DB2 is a frequent choice. It is not meant to run on everything from small PCs to big servers. It is intended for heavy duty use with large amounts of data and high availability–extremely serious databases.
The more commonly used a DBMS is, the easier it is to find people who have experience working with it. Because DB2 is so focused on very large databases, fewer people know how to work with it. Part of the price of using DB2 is needing to look harder for the right people to help you set it up and use it.
For some organizations, that is almost as important a factor as the technical merits of a DBMS. When you have demanding requirements, you may need a DBMS that is not commonplace. If so, the experts who work with it are likely to be harder to find and may cost more because of their rare skills.
Not All Databases Live on Disks
The DBMSs mentioned so far keep their information primarily in disk files. The more sophisticated DBMSs may allow you to configure them so the most frequently used data is in computer memory, where performance is much faster, but the data in memory is sent to disk as soon as possible so it will not be lost if the system crashes.
That trick speeds up database performance, but not enough for some purposes. Telephony, spacecraft telemetry, manufacturing automation, and many other applications need to keep up with equipment instead of with comparatively slow human beings. The next move up in performance is a database that lives entirely in memory, which is much faster than disk storage.
Memory resident databases, if well architected and well built, can be blazingly fast. Of course, speed comes with a trade-off. Memory resident databases disappear if the system crashes. As a fallback, this type of database saves a copy of itself to disk occasionally, like a snapshot of its condition at an instant in time. When restarted, the database uses the copy on disk to repopulate itself. If it is gracefully shut down and restarted, the disk copy is up to date and nothing is lost. But if the system crashes, everything that occurs between the last pre-crash snapshot and the restart is lost.
Notice that I said these databases are very fast when well built. I have seen highly proprietary memory resident databases in telecommunications with mind-boggling speed. I have seen another that is commercially available, runs on multiple operating systems, and is so reliable that it is used by nuclear power stations. But I have also seen one that is commercially available, sold for automation purposes, and unable to meet the performance of leading disk resident database engines. The fact that a database is memory resident does not guarantee it will run like greased lightning.
Not All Data is Relational and Sorted Into Lists
Up to now, all the DBMSs mentioned have been relational. They organize data into tables, which you can think of as being similar to spreadsheets. Each table has fields similar to the columns in a spreadsheet, and records similar to the rows in a spreadsheet. Tables can connect with each other through keys. If you have a table of customer contact information and another table of orders placed by customers, the customer identification code that appears in both tables will relate each customer’s contact information with all the orders that customer has placed.
Not all data can be readily organized that way. Sometimes you need a spaghetti bowl of interconnections. I see this most often in memory resident databases with a profound need for speed. Some of them use a maze of interconnections. A relational database may have to trace several relationships to pull together the same data, where a maze of interconnections allows a database to grab related data more directly.
Superfast databases also use a variety of methods to locate desired pieces of data more rapidly. Sorting in alphabetical, numerical or date order is natural for people, but looking up data in that type of list is not the most efficient way for a computer to do it. As examples of faster methods, some databases use hash tables or balanced binary searches for lookups.
However, the rise of web applications has accelerated other needs for less rigidity in the structure of data that goes into databases. Non-relational databases are useful where data structure is in flux. Although you might expect this to be needed mostly for exotic purposes, I know of systems at a major airport and in a large property management operation where non-relational databases are in use.
Some Databases Cast Data in Stone
A variety of other features can make one database stand out from others for a specific purpose. Most people think of information in databases as easy to find and update. But for an example of a special purpose where the data is not updatable, you need not look any farther than your medicine cabinet.
Pharmaceutical manufacturers are required to keep vast amounts of information on file about every batch of medicine they make. If drug regulators notice an alarming pattern and want to check whether something went wrong in a batch, the manufacturer needs to be able to produce that data immediately-and that data must be pristine. It must never have been changed since the batch of medicine was made and the data was captured. For this reason, some database engines can be set so that they will not allow any information in the database to be changed. New data can be inserted, but from that moment on, the data is cast in stone.
A few DBMSs can keep track of the source of everything done to their data. Most databases require that type of tracking to be done by writing special software that users must go through to touch the data, but it is possible to get a DBMS which has that capability built in. This is not necessarily practical in a database that has to keep up with machinery. But in a hospital, it may be crucial to know which person made each change or addition in a patient’s records.
The DBMSs mentioned by name in this article are not the only ones available in their categories, let alone in the entire world of databases. Before you build a database, it is worth your while to think about what you need. You will put a lot of effort and expense into creating your database, so it should be in a DBMS that suits what it has to do.
As an example of good versus bad choices, consider two large companies in regulated industries that need to keep specific data for years and be ready to show it to government auditors on a moment’s notice. Both chose a memory resident DBMS because their systems need to collect data from machinery. As mentioned earlier, if the system crashes, all data between the last “snapshot” and the crash is permanently lost. One of these companies copies the regulated data into a disk resident DBMS right away. The other does not, leaving it exposed to fines from regulatory agencies for losing data whenever the system crashes.
If you are not sure which engine to use, ask a professional. By now, you know just saying you want a new database is not enough. Use this article to help you explain what you need. When you tell an expert that you want a database to run on your favorite operating system, with minimal data loss in the event of a crash, moderate speed, high security and 200 simultaneous users… that is a great start.