Guidelines for Writing Queries
Queries often require complex conditions (or "query filters") to return appropriate results.
WHERE clauses (or conditions) reduce the amount of data to be processed
in a SELECT statement (comprised of desired result objects or columns)
by specifying that only those rows meeting the criteria in the WHERE clause
are displayed. Depending upon which tool you use to query the Data Warehouse,
your degree of control over the query language and operators will vary.
- BusinessObjects and Microsoft Access employ graphical interfaces
that construct appropriate SQL behind the scenes based on the tables
and joins the user indicates.
- Oracle SQL*Plus, on the other hand, requires
the user to write his or her own SQL statments.
- The Oracle database system,
which forms the foundation of the Data Warehouse, permits the use of powerful
SQL operators, some of which may also be available as post-query operators
in your desktop tool.
The following guidelines are for all users of the Data Warehouse, regardless
of query tool, and provide good practices for efficient querying.
Refer to the help documentation.
The main difficulty most people have with writing queries is knowing which table to use. If
you are unsure about which table to use, refer to the table help
sections "Common Uses" and "Cautions." These sections may help you decide
part of the data collection documentation on the web. Use the search functionality on the main page of each collection to help find exactly what you need.
Take advantage of indexes.
If possible, include
an indexed data element in your condition statement. A query with a record
selection condition using an indexed data element tells the system to
go directly to the rows in the table that contain the value indicated
and to stop retrieving data when the value is no longer found. If a query
does not select records based on an indexed data element in its record
selection condition, the system starts searching at the first row in the
table and works through every row until it reaches the last row in the
table. Indexed columns are noted in each collection's documentation.
Certain operators or query segments are processed by the system without
the use of indexes, even if the column in the condition is indexed. It
may, of course, be necessary for you to construct your query in this manner
to retrieve correct results, but in considering alternatives in query
construction you may wish to keep in mind the following situations
where indexes may not be used:
- Negative comparisons such as Not Equal (represented by
=! in SQL), Different From, or Not In. Avoid negative phrasing of condition
statements as much as possible. In general, it is easier (both for the
system and for you) to interpret a positive phrase than a negative phrase.
For example, instead of the condition statement "If term is not greater
than 1998A," rephrase the statement to "If term is less than or equal
to 1998A." Or, if practical, eliminate the condition from the query and
filter your results on the desktop.
- Nulls such as Is Null or Is Not Null.
- Like or Matches Pattern comparison with a date or number column.
For example, to retrieve employee payments from March (of any year) use
"FISCAL_MONTH_SEQ = '09'" rather than "CHECK_DATE Like 03/%".
- Wildcards at the beginning of a string. Avoid matching
patterns beginning with a wildcard (Like %...). A wildcard at the end
of a pattern is definitely appropriate and can be very efficient (e.g.,
Where Fund Like 5% will retrieve all Funds 500000 - 599999).
- Indexed columns modified by an expression or function (e.g.,
rather than concatenating all Chart of Account segments as COA_CNAC||COA_ORG||COA_BC||COA_FUND||COA_OBJECT||COA_PROGRAM||COA_CREF,
select the column COA_ACCOUNT, which is indexed). Also, comparing
an indexed column to another indexed column using Greater Than (>)
or Less Than (<).
- Check the "and/or" qualifiers in the records selection
criteria of the condition statement.
- For example, a query coded to get
students with the following conditions statement will actually return
every student in COL for 1998A and every student for 1998C regardless
of the division:
- If division is equal to 'COL' and term is equal to '1998A' or
term is equal to '1998C'
-
The query coded to get students with the following conditions statement
will return every student in COL for 1998A and 1998C:
- If division is equal to 'COL' and (term is equal to '1998A'
or term is equal to '1998C')
If your access to data is restricted, do not force
the security system to select records for you.
For example, if you are
authorized to access data only from a particular department, one of your
record selection conditions should state "If Organization='My Organization',"
where organization is the code for your department.
Review your query before executing it.
Check to make sure that your query is as precise as possible. This includes selecting
the tables that will give the best results, reviewing selection conditions
and sort criteria, and if it makes sense to do so, including at least
one indexed data element in the conditions statement. For example, if
you want to find all undergraduate freshmen and their names, choose the
Person table rather than the ADDRESS table. This is because a student
can have multiple addresses, and choosing the ADDRESS table would return
a name for each address the student has listed.
Be aware of data that is subject to change and
its effect on your results.
For example, a grade change can affect a student's
grade point average (GPA). A query executed before and after the grade
change may or may not result in a changed GPA. In addition, keep in mind
that there is a "data delay" between Warehouse collections and their respective
source systems. Refresh schedules are noted in each collection's documentation.
Give the query time to execute.
Queries can take
many minutes to execute; complex queries can take longer. It is not uncommon
for a query to take 5 to 10 minutes to complete. In general, let the query
run until it finishes. If the query takes longer than 1 hour to complete,
contact Enterprise Information & Analytics.
Note the date and time of the query when creating
reports or communicating results to others.
|