Penn Computing

Penn Computing

Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn

 
  UPDATES
Current Load Status
Regular Availability
 
  INFORMATION
FAQs & Tips
Password Changer
Support services
Security
About the Data Warehouse
Data Administration
 
  DATA COLLECTIONS
Advancement
Assets
BBR
BRIM
Express Mail
Facilities
Faculty
GAR
General Ledger
Infrastructure
ISSS-iOffice
Learning Management
Position Inventory
Research-PennERA Proposals
Salary Management
Student
Travel Expense Management
Tuition Distribution
World Travel
Cross-Collection

This lesson should take about 35 minutes to complete. It will introduce you to basic concepts about relational database management systems, such as tables, primary keys, and indexed columns.

Relational Database System Concepts

After completing this lesson, please answer the following questions:

1.  What element or group of elements uniquely identifies a row in a table?

2.  Describe how one table can be joined to another.

3.  How can you improve the performance of a query?

4.  When should a query include a condition that explicity deals with null values?

5.  Describe the four clauses that make up an SQL (Standard Query Lanaguage) statement. Be sure to identify the function of each clause in the SQL statement.

Overview
The data in the Data Warehouse is stored using a relational database management system (or RDBMS) called Oracle. Relational databases store their data in two-dimensional tables (rather like spreadsheets).

Tables
Each record within a table is stored in a row, and each data element (or field) within a row is stored in a column. (The two dimensions of a table are row and column.) As shown in Figure 1, a database might have a BLOOD_DONORS table and a DONATIONS table.

Primary Keys
A primary key (PK), consisting of one or more columns, uniquely identifies each row in a table.

For example, the primary key of the BLOOD_DONORS table is DONOR_ID, and the primary key of the DONATIONS table is the combination of DONOR_ID and DONATION_DATE. Figure 2 further illustrates the example: In the BLOOD_DONORS table, primary key value 123-45-6789 uniquely identifies a record. In the DONATIONS table, it is the combination of DONOR_ID and DONATION_DATE that uniquely identifies a record.

Joining Tables
Tables may be joined to each other.

For example, the BLOOD_DONORS table may be joined to the DONATIONS table by matching DONOR_ID in the BLOOD_DONORS table to DONOR_ID in the DONATIONS table to get a report as shown in Figure 3.

Indexed Columns
As shown in Figure 4, queries involving selection conditions (to determine which rows are returned) may run faster when indexed columns are used. An index works just like the index in the back of a book. For example, you can do a sequential read and look at every page of a book to find the references to John Doe, or you and find those references faster by doing an indexed read, using the index (which happens to point to just 3 pages out of a 500 page book). Oracle makes the decision as to whether it is quicker to do a sequential read or an indexed read for a given query, but it won't try to do an indexed read at all if your query row selection conditions do not mention an indexed column.

Indexed columns and primary keys. The primary key is indexed. If the primary key consists of more than one column, the index is created from the concatenation of all the columns in the primary key. (For example, the index for the primary key for the DONATIONS table has entries for 123-45-6789 1991/11/25, 234-56-7890 1992/01/21, and 234-56-7890 1993/10/25.) The primary key index may be used if your conditions mention all components of the primary key, or just the first one.

Individual columns may also have indexes created for them.

Null Values
Unless otherwise specified, a column in an Oracle table may have a null value. Any column that is part of the primary key, or that has been specified as being Not Null, may not contain a null value. All other columns may contain null values.

It is important to know about nulls when you set conditions (other than simple = conditions) on your query. For example, Temperature may be null. If you ask for rows with a Temperature equal to 98.6 (WHERE TEMPERATURE = 98.6), the query will not return rows that have a null value for Temperature, and that is probably what you want. But if you ask for rows with a Temperature less than 98.6 (WHERE TEMPERATURE (98.6), the query will also not return rows that have a null value for Temperature, because null means unknown--it might be less than 98.6, or it might not. To be safe, include a condition that explicitly deals with null values. For example, you could say WHERE (TEMPERATURE (98.6) OR (TEMPERATURE IS NULL). Dealing with nulls is most commonly an issue for negative conditions--for example, WHERE (TEMPERATURE <> 98.6) OR (TEMPERATURE IS NULL) would get all the rows where the Temperature is not 98.6, including rows where the Temperature has a null value.

SQL. Structured Query Language (SQL) is the ANSI (American National Standards Institute) standard language for use with relational database management systems. Query tools (such as Business Objects) automatically translate your query specifications into SQL.

The SQL for a query generally has four parts:

  1. The SELECT clause, a required clause that specifies the column(s) to be retrieved
  2. The FROM clause, a required clause that specifies the table(s) from which the data is to be retrieved
  3. The WHERE clause, an optional clause that specifies the conditions for selecting specific rows to be retrieved
  4. The ORDER BY clause, an optional clause that specifies the sort order in which the retrieved rows are to be written to the query output
top

Information Systems and Computing
University of Pennsylvania
Comments & Questions


Penn Computing University of Pennsylvania
Information Systems and Computing, University of Pennsylvania