This lesson is in the early stages of development (Alpha version)

03 Accessing Data with Queries

Overview

Teaching: 30 min
Exercises: 5 min
Questions
  • What is SQL?

  • How can I write basic queries in SQL?

Objectives
  • Define SQL

  • Create simple SQL queries to return rows and columns from existing tables

  • Filter data given various criteria

  • Sort the results of a query

Definition of SQL

SQL stands for Structured Query Language. SQL allows us to interact with data in a relational database through queries. These queries allow us to perform a number of actions such as inserting, selecting, updating, and deleting information in a database.

In SQL, querying data is performed by a SELECT statement. A select statement has 6 key components;

SELECT colnames
FROM tablename
WHERE conditions
GROUP BY colnames
HAVING conditions
ORDER BY colnames

Writing your first SQL query - the Select statement

Let’s start by using the farms table. Here we have data on every individual farm.

Let’s write an SQL query that selects all of the columns in the farms table. SQL queries can be written in the box located under the “Execute SQL” tab.

If you want to display all of the columns in a table, use the wildcard *.

SELECT *
FROM Farms;

Click on the right arrow above the query box to execute the query. (keyboard shortcut “Cmd-Enter” - Mac or “Ctrl-Enter” - Windows machine to execute a query.)

The results are displayed in the box below your query.

We have capitalized the words SELECT and FROM because they are SQL keywords. SQL is case insensitive, but it helps for readability, and is good style. The ‘*’ character acts as a wildcard meaning all of the columns but you cannot use it as a general wildcard. So for example, the following is not valid.

SELECT A*
FROM Farms;

If you run it you will get an error. When an error does occur you will see an error message displayed in the bottom pane.

If we want to select a single column, we can type the column name instead of the wildcard *.

SELECT Country
FROM Farms;

If we want more information, we can add more columns to the list of fields (such as A06_province, A07_district, A08_ward, A09_village), right after SELECT:

SELECT Country, A06_province, A07_district, A08_ward, A09_village
FROM Farms;

Limiting results

Sometimes you don’t want to see all the results, you just want to get a sense of what’s being returned. In that case, you can use the LIMIT command. In particular, you would want to do this if you were working with large databases. In the example below only the first ten rows of the result of the query will be returned. This is useful if you just want to get a feel for what the data looks like.

SELECT *
FROM Farms
LIMIT 10;

Exercise

Write a query which returns the first 5 rows from the Farms table with only the columns Id, and B16 - B20 (B16_years_liv, B17_parents_liv, B18_sp_parents_liv , B19_grand_liv, B20_sp_grand_liv).

Solution

SELECT  Id
    , B16_years_liv
    , B17_parents_liv
    , B18_sp_parents_liv
    , B19_grand_liv
    , B20_sp_grand_liv
FROM Farms
LIMIT 5;

Because the query uses several columns (with longish names), for readability they have been set out on separate lines. SQL takes of white space to you are free to arrange the text of the query as you like.

Unique values

If we want only the unique values so that we can quickly see what farms have been sampled we can use DISTINCT. Using the farms table we can obtain a list of all the different vlaues of the ‘A06_province’ column contained in the table.

    SELECT DISTINCT A06_province
    FROM Farms;

If we select more than one column, then the distinct pairs of values are returned

SELECT DISTINCT A06_province, A07_district
FROM Farms;

Filtering - using the Where clause

Databases can also filter data – selecting only the data meeting certain criteria i.e. certain values or ranges within one or more columns. For example, let’s say we only want data for the in rows where the value in the B16_years_liv column is greater than 25.

We need to add a WHERE clause to our query:

In this example we are only interested in rows where the value in the B16_years_liv column is greater than 25

SELECT  Id, B16_years_liv
FROM Farms
WHERE B16_years_liv > 25
;

In addition to using the ‘>’ we can use many other operators such as <, <=, =, >=, <>

SELECT  Id, B17_parents_liv
FROM Farms
WHERE B17_parents_liv = 'yes'
;

Using more AND and OR logical expressions

We can also use more sophisticated criteria by combining tests with AND and OR keywords. For example, suppose we want the data where both sets of parents live in the house and both have grandchildren:

SELECT  Id
FROM Farms
WHERE (B17_parents_liv = 'yes') AND (B18_sp_parents_liv = 'yes') AND (B19_grand_liv = 'yes') AND (B20_sp_grand_liv = 'yes') 
;

Notice that the columns being used in the WHERE clause do not need to returned as part of the SELECT clause. You can ensure the precedence of the operators by using brackets and also aid readability

SELECT  Id
FROM Farms
WHERE (B17_parents_liv = 'yes' OR B18_sp_parents_liv = 'yes') AND B16_years_liv > 60
;

Exercise

From the above query, breakdown the Where clause so that each component can be tested individually. Make a note of how many rows are returned in each case.

Solution

To test each of the or clauses

SELECT  Id
FROM Farms
WHERE B17_parents_liv = 'yes'
;
SELECT  Id
FROM Farms
WHERE B18_sp_parents_liv = 'yes'
;
SELECT  Id
FROM Farms
WHERE (B17_parents_liv = 'yes' OR B18_sp_parents_liv = 'yes') 
;
SELECT  Id
FROM Farms
WHERE B16_years_liv > 60
;

OR generally creates a less restrictive condition and AND makes a more restrictive condition.

Building more complex queries

Now let’s combine qeuries to get data from the farms table where the value of B16_years_liv is in the range 51 to 59 inclusive. We could use an AND operator

SELECT Id, B16_years_liv
FROM Farms
WHERE B16_years_liv > 50 AND B16_years_liv < 60
;

The same results could be obtained by using the BETWEEN or IN operators

SELECT Id, B16_years_liv
FROM Farms
WHERE B16_years_liv BETWEEN 51 AND 59
;
SELECT Id, B16_years_liv
FROM Farms
WHERE B16_years_liv IN (51, 52, 53, 54, 55, 56, 57, 58, 59)
;

The list of values in brackets do not have to be contiguous or even in order.

Exercise

Write a query using the Farms table which returns the columns Id, A09_village, A11_years_farm, B16_years_liv. We are only interested in rows where the A09_village value is either ‘God’ or ‘Ruaca’. Additionally we only want A11_years_farm values in the range 20 to 30 exclusive and B16_years_liv values strictly greater than 40. There are many ways of doing this, but try to use an inequality, an IN clause and a BETWEEN clause.

Solution

SELECT Id, A09_village, A11_years_farm, B16_years_liv
FROM Farms
WHERE     A09_village IN ('God', 'Ruaca') 
      AND A11_years_farm BETWEEN 21 AND 29 
      AND B16_years_liv > 40
;

Sorting results

We can also sort the results of our queries by using ORDER BY. For simplicity, let’s go back to the farms table and select the rows where the village is ‘God’ and alphabetize it by years farmed.

SELECT Id, A09_village, A11_years_farm, B16_years_liv
FROM Farms
WHERE A09_village = 'God';

Now let’s order it by years on the farm (A11_years_farm).

SELECT Id, A09_village, A11_years_farm, B16_years_liv
FROM Farms
WHERE A09_village = 'God'
ORDER BY A11_years_farm;

By default the SQL assumes Ascending order. You can make this more explicit by using the ASC or DESC keywords.

SELECT Id, A09_village, A11_years_farm, B16_years_liv
FROM Farms
WHERE A09_village = 'God'
ORDER BY A11_years_farm DESC
;

You can also order by multiple columns

SELECT Id, A09_village, A11_years_farm, B16_years_liv
FROM Farms
WHERE A09_village = 'God'
ORDER BY A11_years_farm DESC , B16_years_liv ASC
;

Key Points

  • Strictly speaking SQL is a standard, not a particular implementation

  • SQL implementation are sufficiently close that you only have to learn SQL once

  • The SELECT statement allows you to ‘slice’ and ‘dice’ the columns and rows of the dataset so that the query only returns the data of interest