P1: Introduction to SQL

Objectives

By the end of this practical you should be able to:

  1. log into an Oracle database;
  2. perform basic operations on selected tables; and
  3. retreive information from the database using SQL.
This practical should require two weeks to complete.

Overview

This practical will introduce you to the basic elements of the Structured Query Language (SQL) and you will learn how to interact with the Oracle 10g DBMS. The practical will use an Oracle 10g database containing sample data from the US Census Bureau and the Digital Chart of the World (covered in Kothuri et al., 2004). In this tutorial only non-spatial operations will be covered.

Exercise P1.1: Using SQL*Plus

This exercise will cover the basics of connecting to an Oracle database through the SQL*Plus instant client, and will enable you to get an overview of the data available. Interacting directly with the Oracle DBMS through the SQL*Plus command line interface will help to get the grasp of the SQL syntax and is a basic technique we shall be using over coming weeks.
  1. Run Oracle SQL*Plus instant client: In the practical labs, the SQL*Plus client can be started from the Windows Start menu (under "Departments", "Geomatics", "Oracle"). On some other machines to start the sqlplus client, you will need to run the executable file sqlplus.exe. The exact location of this file may vary from machine to machine, but is often located in the c:\oraclient directory. You can run this file using Windows file manager or the command line. You should be presented with a new command line window and the SQL> prompt. If not, and you a "login" prompt, start a new command line prompt (Start menu -> run -> "cmd"), and change to your sqlpus client directory (e.g., cd c:\oraclient) and type sqlplus /nolog.
  2. Login to the course database server: The course uses a remote database server hosted in the geomatics department. To log in to the server you need the following details:
    1. loginname: <information supressed for public version>
    2. password: <information supressed for public version>
    3. databasename: <information supressed for public version>
    4. serveraddress: <information supressed for public version>
    To log in, at the SQL> prompt type the following (substituting the correct information from above). When you log in for the first time, the system will require you to set a new password. Do this, and remember your password for future weeks.
    CONNECT loginname/password@//serveraddress:1521/databasename
  3. Get an overview of the available data: When first logging in a new database, it is good to get the overview of the available data. Execute the statement:
  4. SELECT owner, table_name FROM all_tables; 
    This statement will list all the data tables already available to you in the teaching database, along with the "owner" of these tables. You will start with no tables of your own, but many tables will still be available for you to query. In particular, those owned by "spatial" are the ones we shall be using in this course. Note the pattern of the SELECT statement below (see the lecture notes or http://www.w3schools.com/sql for more information; square brackets below indicate optional clauses). Using this pattern, try to refine the SELECT statement above to show only those tables owned by "spatial". Ask for guidance from the demonstrators if you get stuck at this point. 
    SELECT [ DISTINCT | ALL ]
        column_expression1, column_expression2, ....
        [ FROM from_clause ]
        [ WHERE where_expression ]
        [ GROUP BY expression1, expression2, .... ]
        [ HAVING having_expression ]
        [ ORDER BY order_column_expr1, order_column_expr2, ...];
  5. Table description: To get the description of the data available in a specific table, use the DESCRIBE statement. The DESCRIBE statement has the syntax: 
  6. DESCRIBE owner.tablename
    Get the description of the "us_cities" table (owner "spatial"). What does the output mean?
  7. View records of a table: Issue SELECT statements allowing you to view the records of the "us_cities" table. Note that because this table is not owned by you, you will need to use the form "owner.tablename" whenever you refer to it in any SQL statement, like SELECT. 
    1. Use the * operator to select all the records.
    2. Select the record of the city of Richmond.
    3. Select the population (column pop90) of the city of Richmond (projection operation).
    You must prefix an attribute reference with the table name (for example, us_cities.pop90) whenever the attribute name appears in more than one table in the FROM clause (to resolve any ambiguities). You may use this format even when the attribute name is unique in the tables you are querying, to be on the safe side. 
  8. Disconnnect from the database. Use the command DISCONNECT to disconnect from the database. Re-connect using the command connect and your connection string as specified earlier and try the EXIT command. Note the difference.

Exercise P1.2: Further SQL

In the previous exercise you have aquired familiarity with the Oracle SQL*Plus instant client and the basic SQL command, SELECT. In this exercise you will learn additional commands and perform some basic non-spatial data analysis. We will look at nested SELECT statements and restrictions in SELECT statements. In addition to your lecture notes, you may need to refer to an SQL syntax guide to help you answer some of the following questions. There are many books on the subject, and the web site http://www.w3schools.com/sql provides an excellent, simple, and accessible tutorial for all the SQL functions covered in this practical.
  1. List all the US cities in alphabetical order: HINT: Use the ORDER BY operator.
  2. Count the US cities in California with a population of greater than 150000: HINT: The state abbreviation for California is CA. You need to use the COUNT keyword and the AND keyword to chain conditions.
  3. Get the population of the largest city in Texas: HINT: Try using the MAX operator. The state abbreviation for Texas is TX.
  4. Get the name of the city with the largest population in Texas: HINT: You will probably need to use a nested SELECT query to achieve this. When nesting a select query you enclose the entire subquery in parenthesis (brackets).
  5. List the 100 most populous US cities: HINT: The "most populous" cities means those with the highest population. There are at least two ways to do this. The simplest can be found by looking closely at the results of describing the table for US cities. Another more complex way is to use the special SQL variable "ROWNUM" in the WHERE clause of your SELECT statement to select only those records higher that a particular row number. However, check your answer carefully as you must be certain to order your cities by population before using ROWNUM to restrict your answer to only those most populous cities. Your list should be sorted in descending order (most populous first).
  6. Get the number of different states which contain the 100 most populous US cities: HINT: The answer requires the number of different states, so two or more cities in the same state only count once.
Note that the value of a string (Oracle data type varchar) in a SELECT statement must be always enclosed single quotes, e.g., 'value'.

Exercise P1.3: Data entry and join queries

In this exercise, you will define and create your own table with sample data; add and modify the data entries; and finally delete ("drop") the table. All the queries in the previous exercise concern the values in a single table (us_cities). The queries in this exercise are join queries, where the answers can only be found by combining information across multiple tables within a single SELECT statement.
  1. Create the table "directory": Create a new table containing the name of a company, city, product name, and number of items sold. The items column will be an integer of length 3. Table creation is achieved using the "CREATE" statement. The relation scheme for your table is:
  2. directory(company: VARCHAR2(32), city: VARCHAR2(32), product_name: VARCHAR2(32), no_of_items: NUMBER)
    What are the candidate keys for your new table? What should be the primary key for this table?
  3. Enter 10 data records for your table: HINT: Data entry is achieved using the SQL "INSERT" statement. You can invent whatever data you want, except for the "city" attribute, which must contain only names of cities from the "us_cities" table (at least two different names). You should also have at least two different products sold by at least two different companies. 
  4. List the names of products along with the abbreviations for states they are sold in: HINT: this is a join query requiring you to join your new "directory" table with the "us_cities" table, using the join field "city" on both tables. If you encounter problems, make sure the city names in the "city" column of your new "directory" table are spelt correctly, and are exactly as they appear in the "us_cities" table. 
  5. List the total number of sales by state: HINT: this is another join query requiring the use of the "GROUP BY" SQL keywords. 
  6. List the total number of products sold by city: This is similar to the previous query.
  7. Get the population of the city which has the company that sold most of the products: This requires an extension of the previous query, using a nested SELECT statement. 
  8. Update one of your directory records with new sales data: Suppose it is Christmas and one of your companies has just sold a lot of one of your products in a particular city. Use the "UPDATE" statement to increase the number of items sold for that records.
  9. Delete the company that sold the least amount of the products from the table: Suppose an underperforming company has be sold. Delete all the records in the table for the company that has in total the lowest number of sales. HINT: Use the SQL "DELETE" statement. This query can be achieved using one nested SQL query!
  10. Drop the "directory" table: When you are finished, you should delete the "directory" table using the "DROP" SQL statement.

Assessment A1: SQL Foundations

Assignment

The questions you must answer for your first assignment all concern the tables "us_cities" (familiar from practical 1) and "us_states" (contains demographic data for US states) which may both be accessed from the course database, using the same techniques as in the practical exercises. Your assignment is to provide SQL/Oracle statements to answer to the following 10 questions (see "Submission" section above for more details).
  1. Describe the "us_states" table.
  2. List all the states names and state abbreviations for those US states with a total land area greater than 100,000 square miles ("landsqmi" column). Order the list in descending order of total land area (largest area first).
  3. List the names of states that have a greater percentage of their population at retirement age (from ages 60 and 64, "pct60_64" column) than teenagers (from 14 and 17 years old, "pct14_17" column).
  4. List in alphabetical order the names of states that have a population density of less than 10 people per square mile ("poppsqmi" column). Note that some states have no data for total area or population density (listed with zero "landsqmi" and "poppsqmi"). These states will need to be omitted from your listing.
  5. List the names of US cities that begin with the letters "Sa" along with the name of the state (not the state abbreviation) in which each city is located. Order your list in ascending order of state name (i.e., "A" first).
  6. Get the total number of cities in the US with populations that account for more than 20% of the total population for the state in which each city is located. The "pop90" column gives the populations of cities in the us_cities table, while the "totpop" column gives the total population of states in the us_states table. 
  7. Cities in the us_cities table are ranked according to population (rank90). Display a table that lists each state abbreviation along with the ranks of the highest and lowest ranking cities in that state in terms of population. Order your table in alphabetical order of state abbreviation. 
  8. Create a new table with the following relation scheme: mystates(state: VARCHAR(32), state_abrv: VARCHAR(2), totpop: NUMBER).
  9. Populate your new table with appropriate data from the us_states table for only the 20 largest states in terms of population.
  10. Delete your new mystates table.
Creative Commons License
This work is created by Matt Duckham (http://www.duckham.org) and licensed under a Creative Commons Attribution 3.0 United States License.