This is an alpha lesson to teach Data Management with SQL for Social Scientists, We welcome and criticism, or error; and will take your feedback into account to improve both the presentation and the content.
Databases are useful for both storing and using data effectively. Using a relational database serves several purposes.
- It keeps your data separate from your analysis. This means there’s no risk of accidentally changing data when you analyze it.
- If we get new data we can rerun a query to find all the data that meets certain criteria.
- It’s fast, even for large amounts of data.
- It improves quality control of data entry (type constraints and use of forms in Access, Filemaker, etc.)
- The concepts of relational database querying are core to understanding how to do similar things using programming languages such as R or Python.
This lesson will teach you what relational databases are, how you can load data into them and how you can query databases to extract just the information that you need.
Prerequisites
We expect you to have learn a bit about the SAFI dataset in the spreadsheet and OpenRefine session. It is not necessary, but will greatly improve your ability to understand the power of SQL and when to use it versus another tool.