This lesson is in the early stages of development (Alpha version)

Data Management with SQL for Social Scientists *alpha*: SQL Cheat Sheet

Key Points

01 What is a relational database?
  • A relational database is data organised as a collection of related tables

  • SQL (Structured Query Language) is used to extract data from the tables. Either a single table or data spread across two or more related tables.

02 Using DB Browser for SQLite
  • The DB Browser for SQLite application allows you to connect to an existing database or create a new database

03 Accessing Data with Queries
  • Strictly speaking SQL is a standard, not a particular implementation

  • SQL implementation are sufficiently close that you only have to learn SQL once

  • The SELECT statement allows you to ‘slice’ and ‘dice’ the columns and rows of the dataset so that the query only returns the data of interest

04 Missing Data
  • You should expect missing data

  • You need to know how missing data is being represented in your dataset

  • Database systems always represent what they consider to be missing data as NULL

  • You can explicitly test for NULL values in your data

  • You may need other tests for different representations of NULL

05 Creating New Columns
  • New result columns can be created using arithmetic operators or builtin functions

  • New columns have to be given names or Alias’

  • The Case coding structure can be used to create new columns

  • The new columns are only in the query results. The original table is not changed

06 Aggregations
  • Builtin functions can be used to produce a variety of summary statistics

  • The DISTINCT keyword can be used to find the unique set of values in a column or columns

  • Data in columns can be summarised by values using the GROUP BY clause

  • Summarised data can be filtered using the HAVING clause

07 Saving queries for future use - creating views
  • A View can be treated just like a table in a query

  • A View does not contain data like a table does, only the instructions on how to get the data

08 Combining Data with Joins
  • Joins are used to combine data from two or more tables.

  • Tables to be joined must have a column in each which represent the same thing

  • There are several different types of joins

  • The Inner join is the most commonly use

  • You may have to use the other join types to discover missing data or gaps in your data

09 EXTRA: Using database tables in other environments
  • ODBC - Open DataBase Connector allows a database to be connected to a program or application

  • Each database system has its own ODBC connectors

  • Programs such as Excel allow you to use ODBC to get data from databases

  • Programming languages such as Python and R provide libraries which facilitate ODBC connections

10 EXTRA: The SQLite command line
  • SQLite databases can be created, managed and queried from the SQLite shell utility

  • You can run the shell interactively from the commandline, typing queries or dot cammands at the prompt

  • You can call the SQLite3 program and specify a database and a set of commands to run. This aids automation

SQL Cheat Sheet


Basic query

SELECT column_names
FROM table_name;
SELECT * 
FROM table_name;
SELECT DISTINCT column_name 
FROM table_name;
SELECT column_names
FROM table_name
WHERE column_name operator value;
SELECT column_names
FROM table_name
WHERE column_name IN (value1, value2, value3);
SELECT column_names
FROM table_name
ORDER BY column_name ASC;

Aggregation

SELECT aggregate_function(column_name)
FROM table_name;
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name;
SELECT column_name, aggregate_function(column_name)
FROM table_name
GROUP BY column_name
HAVING aggregate_function(column_name) operator value;

Joins and aliases

SELECT column_names
FROM table_name1
JOIN table_name2 
ON table_name1.column_name = table_name2.column_name;
SELECT alias1.column_name1, alias1.column_name2, alias2.column_name3
FROM table_name1 AS alias1
JOIN table_name2 AS alias2
ON alias1.column_name = alias2.column_name;

Saving queries

CREATE VIEW viewname AS
SELECT column_names
FROM table_name;

Commenting

-- Select all columns
SELECT * 
-- From the table_name
FROM table_name

Operators

Arithmetic operators + - * /

Comparison operators = < > <= >= <>

Logical operators ALL AND ANY BETWEEN EXISTS IN LIKE NOT OR SOME