Introduction
Essential Keywords
- SELECT
- DISTINCT
- WHERE
- ORDER BY
- DESC
- NULL Values and Logical Operators
- BETWEEN
- AS
- LIKE and NOT LIKE
- LIMIT and OFFSET
- Disconnecting
Joins and Aggregations
- Joining Multiple Tables
- Outer Joins
- Aggregations
- Grouping
- Set Operations
- Order of Keywords

1. Introduction

SQL (Structured Query Language) is a widely used programming language for storing, managing, and retrieving data in databases. It is especially valuable for analyzing large datasets that cannot be stored on a single machine, a.k.a. big data. One of the key advantages of using SQL for big data analysis is that it does not require data to be directly imported, which in turn keeps the processes of data analysis and data maintenance separate. This separation enhances the efficiency and quality of both processes, ensuring that the insights derived from big data are accurate and actionable.

There are many dialects of SQL, including SQL Server, PostgreSQL, Oracle, MySQL, and SQLite. In this walkthrough of SQL fundamentals, we’ll use SQLite to work with a database stored on our computer. Using SQLite will allow us to execute SQL queries within the RMarkdown environment and having the data local will simplify our connection to it. The database we’ll use is from the company data.world.

To follow along in RStudio, please download the sql-fundamentals-walkthrough repository to your local computer and place the sqlite files in the same folder as this RMarkdown file.

To begin, follow the code chunks below, using the accompanying markdown snippets for insight and guidance. If you see something that can be improved, please do reach out!

library(DBI)# To connect to database management systems
library(RSQLite) # To use SQLite in RMarkdown
library(RMariaDB) # To use MySQL (if needed)
library(dplyr) # Part of tidyverse, has commands for reading data

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Now, let’s set up a connection to the animals database using the dbConnect function. Note that we are not importing data. Instead, we create a connection or pathway that allows us to access it.

db <- dbConnect(
          SQLite(), # Dialect of SQL we will be using
          dbname = "animals.sqlite" # Name of database and where to find it
)
db

## <SQLiteConnection>
##   Path: C:\Users\chira\Documents\Projects\animals.sqlite
##   Extensions: TRUE

Finally, before diving into the essential keywords of SQL, let’s examine the tables in this database.

dbListTables(db)

## [1] "austin_animal_center_age_at_outcome"                      
## [2] "austin_animal_center_intakes"                             
## [3] "austin_animal_center_intakes_by_month"                    
## [4] "austin_animal_center_outcomes"                            
## [5] "austin_animal_center_outcomes_for_animal_type_and_subtype"
## [6] "sqlite_stat1"                                             
## [7] "sqlite_stat4"

2. Essential Keywords

SELECT

Most SQL queries start with the SELECT statement. As the name suggests, SELECT is used to retrieve data from tables in a database.

Here, SELECT is used to retrieve the name and intake_type columns from the table, austin_animal_center_intakes:

SELECT name, intake_type 
FROM austin_animal_center_intakes

Displaying records 1 - 10
name	intake_type
Scamp	Stray
Scamp	Public Assist
Bri-Bri	Stray
Tyson	Public Assist
Jo Jo	Public Assist
Oso	Owner Surrender
Oso	Public Assist
*Dottie	Stray
Manolo	Owner Surrender
Manolo	Owner Surrender

Note: Keywords are case-insensitive, but convention is to capitalize them. It’s also convention to use multiple lines for different parts of the query. This makes the queries more readable.

“*” is a wildcard used to select all columns:

SELECT * FROM austin_animal_center_intakes

Displaying records 1 - 10
animal_id	name	datetime	monthyear	found_location	intake_type	intake_condition	animal_type	sex_upon_intake	age_upon_intake	breed	color
A006100	Scamp	1512655620	POINT(17.0 12.0)	Colony Creek And Hunters Trace in Austin (TX)	Stray	Normal	Dog	Neutered Male	10 years	Spinone Italiano Mix	Yellow/White
A006100	Scamp	1418984460	POINT(14.0 12.0)	8700 Research Blvd in Austin (TX)	Public Assist	Normal	Dog	Neutered Male	7 years	Spinone Italiano Mix	Yellow/White
A191351	Bri-Bri	1447430220	POINT(15.0 11.0)	1912 E William Cannon Rd in Austin (TX)	Stray	Normal	Cat	Intact Female	16 years	Domestic Longhair Mix	Black/White
A322813	Tyson	1425566940	POINT(15.0 3.0)	Austin (TX)	Public Assist	Normal	Dog	Neutered Male	11 years	Rottweiler Mix	Black/Brown
A553074	Jo Jo	1459037040	POINT(16.0 3.0)	3614 Bill Price in Travis (TX)	Public Assist	Normal	Dog	Spayed Female	7 years	Labrador Retriever/German Shepherd	Brown/Black
A672744	Oso	1392657000	POINT(14.0 2.0)	Travis (TX)	Owner Surrender	Normal	Dog	Intact Male	3 years	Rottweiler Mix	Black/Tan
A672744	Oso	1392657000	POINT(14.0 2.0)	Austin (TX)	Public Assist	Normal	Dog	Intact Male	3 years	Rottweiler Mix	Black/Tan
A672745	*Dottie	1392548460	POINT(14.0 2.0)	Mlk And Fm 969 in Austin (TX)	Stray	Normal	Dog	Spayed Female	2 years	American Staffordshire Terrier Mix	White/Red
A672746	Manolo	1400679420	POINT(14.0 5.0)	Austin (TX)	Owner Surrender	Normal	Dog	Neutered Male	9 years	Jack Russell Terrier Mix	White/Black
A672746	Manolo	1392549660	POINT(14.0 2.0)	Travis (TX)	Owner Surrender	Normal	Dog	Neutered Male	9 years	Jack Russell Terrier Mix	White/Black

To see how many animals, count animal_id:

SELECT COUNT(animal_id)
FROM austin_animal_center_intakes

1 records
COUNT(animal_id)
75947

DISTINCT

Suppose we want to collect together all the different types of animals. Add keyword DISTINCT to SELECT:

SELECT DISTINCT animal_type 
  FROM austin_animal_center_intakes

5 records
animal_type
Dog
Cat
Other
Bird
Livestock

DISTINCT applies to all columns listed in SELECT:

SELECT DISTINCT animal_type, 
  sex_upon_intake, 
  age_upon_intake
  FROM austin_animal_center_intakes

Displaying records 1 - 10
animal_type	sex_upon_intake	age_upon_intake
Dog	Neutered Male	10 years
Dog	Neutered Male	7 years
Cat	Intact Female	16 years
Dog	Neutered Male	11 years
Dog	Spayed Female	7 years
Dog	Intact Male	3 years
Dog	Spayed Female	2 years
Dog	Neutered Male	9 years
Dog	Spayed Female	1 year
Other	Unknown	3 years

Note: The output of this query has 539 rows, meaning there are 539 combinations of these three values in the data set.

WHERE

In SQL, we use WHERE to filter data.

To find all cats in the monthly intakes data set:

SELECT year, month, count, animal_type
  FROM austin_animal_center_intakes_by_month
  WHERE animal_type == "Cat"

Displaying records 1 - 10
year	month	count	animal_type
2013	10	542	Cat
2013	11	436	Cat
2013	12	331	Cat
2014	1	335	Cat
2014	2	269	Cat
2014	3	353	Cat
2014	4	566	Cat
2014	5	901	Cat
2014	6	821	Cat
2014	7	881	Cat

Note: Logical operator is =, not ==

ORDER BY

In SQL, we use ORDER BY to sort data in ascending or descending order.

To order by year with ties broken by month:

SELECT year, month, count, animal_type
  FROM austin_animal_center_intakes_by_month
  WHERE animal_type == "Cat"
  ORDER BY year, month

Displaying records 1 - 10
year	month	count	animal_type
2013	10	542	Cat
2013	11	436	Cat
2013	12	331	Cat
2014	1	335	Cat
2014	2	269	Cat
2014	3	353	Cat
2014	4	566	Cat
2014	5	901	Cat
2014	6	821	Cat
2014	7	881	Cat

Note: WHERE must come before ORDER BY.

DESC

To put more recent data first, use DESC. DESC goes after the variable name:

SELECT year, month, count, animal_type
  FROM austin_animal_center_intakes_by_month
  WHERE animal_type == "Cat"
  ORDER BY year DESC, month DESC

Displaying records 1 - 10
year	month	count	animal_type
2017	12	100	Cat
2017	11	427	Cat
2017	10	513	Cat
2017	9	656	Cat
2017	8	565	Cat
2017	7	669	Cat
2017	6	895	Cat
2017	5	914	Cat
2017	4	565	Cat
2017	3	353	Cat

To order by count in descending order:

SELECT year, month, count, animal_type
  FROM austin_animal_center_intakes_by_month
  WHERE animal_type == "Cat"
  ORDER BY count DESC

Displaying records 1 - 10
year	month	count	animal_type
2015	6	1103	Cat
2015	5	1009	Cat
2016	5	921	Cat
2017	5	914	Cat
2014	5	901	Cat
2017	6	895	Cat
2014	7	881	Cat
2014	6	821	Cat
2015	8	812	Cat
2015	10	784	Cat

NULL Values and Logical Operators

By default, ORDER BY puts NULL values at end; to put them first, add NULLS FIRST at the end of the line.

ORDER BY [colname] NULLS FIRST

Logical operators:

& - AND
| - OR
! - NOT

Cats and dogs where intake condition was not normal:

SELECT animal_type, intake_type, 
  Intake_condition, age_upon_intake
  FROM austin_animal_center_intakes
  WHERE (animal_type = "Cat" OR animal_type = "Dog") AND (NOT intake_condition = "Normal")

Displaying records 1 - 10
animal_type	intake_type	intake_condition	age_upon_intake
Cat	Owner Surrender	Sick	15 years
Dog	Stray	Injured	3 years
Cat	Stray	Injured	1 year
Dog	Stray	Injured	5 months
Dog	Stray	Injured	5 years
Cat	Stray	Injured	1 year
Cat	Owner Surrender	Sick	11 years
Dog	Public Assist	Other	3 years
Dog	Stray	Injured	1 year
Dog	Stray	Injured	2 years

To check if data value is NULL, use IS NULL:

SELECT * FROM austin_animal_center_intakes
  WHERE animal_type IS NULL

0 records
animal_id	name	datetime	monthyear	found_location	intake_type	intake_condition	animal_type	sex_upon_intake	age_upon_intake	breed	color

BETWEEN

For numerical data, we can use BETWEEN:

SELECT *
FROM austin_animal_center_intakes_by_month
WHERE count BETWEEN 900 and 2000

Displaying records 1 - 10
year	month	animal_type	count
2013	10	Dog	965
2014	5	Cat	901
2014	5	Dog	966
2014	7	Dog	926
2014	9	Dog	917
2015	5	Cat	1009
2015	5	Dog	988
2015	6	Cat	1103
2015	6	Dog	1014
2016	5	Cat	921

Transforming Data with AS

To rename variables, use AS keyword:

SELECT name AS Name, intake_type AS 'Intake Type'
  FROM austin_animal_center_intakes

Displaying records 1 - 10
Name	Intake Type
Scamp	Stray
Scamp	Public Assist
Bri-Bri	Stray
Tyson	Public Assist
Jo Jo	Public Assist
Oso	Owner Surrender
Oso	Public Assist
*Dottie	Stray
Manolo	Owner Surrender
Manolo	Owner Surrender

To mutate, use AS within select:

SELECT monthyear, 
animal_type, 
outcome_type, 
(age_in_days/365) AS 'Years Old' 
FROM austin_animal_center_age_at_outcome

Displaying records 1 - 10
monthyear	animal_type	outcome_type	Years Old
2014-03	Dog	Return to Owner	6.668493
2014-12	Dog	Return to Owner	7.454795
2015-11	Cat	Return to Owner	16.252055
2015-03	Dog	Return to Owner	11.972603
2015-04	Dog	Return to Owner	7.638356
2014-09	Dog	Return to Owner	2.668493
2014-01	Dog	Euthanasia	2.002740
2014-01	Cat	Euthanasia	15.013699
2014-01	Dog	Return to Owner	3.005479
2014-01	Dog	Return to Owner	2.013699

LIKE and NOT LIKE

The LIKE and NOT LIKE operators are used in a WHERE clause to search for a specified pattern in a column. They are often used with wildcard characters to substitute for one or more characters in a pattern string:

% stands for any number of characters
_ matches a single character.

Examples:

“%test” would match “unfairtest”, “fair test”, or “test” but not “test case”.
“t_st” matches “test” or “tkst”, but not “tests”.

Suppose we want to find all data where the breed includes “wolfhound”:

SELECT sex_upon_outcome,
outcome_type,
outcome_subtype,
breed
FROM austin_animal_center_outcomes
WHERE animal_type = "Dog"
AND breed LIKE "%wolfhound%"
ORDER BY monthyear

Displaying records 1 - 10
sex_upon_outcome	outcome_type	outcome_subtype	breed
Neutered Male	Transfer	Partner	Irish Terrier/Irish Wolfhound
Spayed Female	Adoption	NA	Irish Wolfhound Mix
Neutered Male	Transfer	Partner	Irish Wolfhound/Great Pyrenees
Neutered Male	Adoption	NA	Irish Wolfhound Mix
Neutered Male	Adoption	NA	Catahoula/Irish Wolfhound
Intact Female	Return to Owner	NA	Irish Wolfhound/Great Dane
Neutered Male	Transfer	Partner	Irish Wolfhound/Australian Shepherd
Neutered Male	Return to Owner	NA	Irish Wolfhound Mix
Intact Male	Return to Owner	NA	Irish Wolfhound Mix
Intact Female	Transfer	Partner	Irish Wolfhound Mix

LIMIT and OFFSET

You can select different parts of the data to look at.

LIMIT = How many rows to display (can’t be more than 1000)
OFFSET = How many rows at beginning to skip.

SELECT found_location, intake_type 
FROM austin_animal_center_intakes
LIMIT 100
OFFSET 20000

Displaying records 1 - 10
found_location	intake_type
Barton Creek Square in Austin (TX)	Stray
8411 Ganttcrest Dr in Austin (TX)	Stray
1802 Cherry Orchid in Austin (TX)	Stray
Austin (TX)	Owner Surrender
Austin (TX)	Owner Surrender
Mckinney Rd & William Cannon in Austin (TX)	Stray
Jubilee Tr in Austin (TX)	Stray
4537 S Ih 35 in Austin (TX)	Public Assist
Del Valle (TX)	Owner Surrender
4537 S Ih 35 in Austin (TX)	Public Assist

Disconnecting

When done, it’s good practice to disconnect from the database:

dbDisconnect(db)

3. Joins and Aggregations

In part 2 of this walkthrough, we’ll be using the sales.sqlite database. Let’s set up a connection to it:

db_sales <- dbConnect(
          SQLite(), # Dialect of SQL we will be using
          dbname = "sales.sqlite" # Name of database and where to find it
)
db_sales

## <SQLiteConnection>
##   Path: C:\Users\chira\Documents\Projects\sales.sqlite
##   Extensions: TRUE

dbListTables(db_sales)

## [1] "accounts"       "employees"      "intl_accounts"  "sales_pipeline"
## [5] "sales_pipline"  "sales_teams"    "sqlite_stat1"   "sqlite_stat4"

Joining Multiple Tables

Method 1: Use WHERE

SELECT sales_pipeline.sales_agent, 
       sales_teams.manager, 
       sales_pipeline.account
  FROM sales_teams, sales_pipeline
  WHERE (sales_pipeline.sales_agent = sales_teams.sales_agent)

Displaying records 1 - 10
Sales_Agent	manager	Account
Moses Frase	Dustin Brinkmann	Cancity
Darcel Schlecht	Melvin Marxen	Isdom
Darcel Schlecht	Melvin Marxen	Cancity
Moses Frase	Dustin Brinkmann	Codehow
Zane Levy	Summer Sewald	Hatfan
Anna Snelling	Dustin Brinkmann	Ron-tech
Vicki Laflamme	Celia Rouche	J-Texon
Markita Hansen	Celia Rouche	Cheers
Niesha Huffines	Melvin Marxen	Zumgoity
James Ascencio	Summer Sewald	NA

What’s going on here?

In FROM, draw data from two tables: sales_teams and sales_pipeline
In SELECT, include columns from sales_teams and sales_pipeline using tablename.columnname format
In WHERE, indicate how the join works. Declare which of the variables from the two tables must be equal. (It’s fine to set multiple columns equal.)

This is an inner join. It only keeps agents that appear in both tables.

In WHERE, in addition to defining the join, we can also specify filtering conditions:

SELECT sales_pipeline.sales_agent, 
       sales_teams.manager, 
       sales_pipeline.account
  FROM sales_teams, sales_pipeline
  WHERE (sales_pipeline.sales_agent = sales_teams.sales_agent) AND 
      sales_pipeline.deal_stage = "Won"

Displaying records 1 - 10
Sales_Agent	manager	Account
Moses Frase	Dustin Brinkmann	Cancity
Darcel Schlecht	Melvin Marxen	Isdom
Darcel Schlecht	Melvin Marxen	Cancity
Moses Frase	Dustin Brinkmann	Codehow
Zane Levy	Summer Sewald	Hatfan
Anna Snelling	Dustin Brinkmann	Ron-tech
Vicki Laflamme	Celia Rouche	J-Texon
Markita Hansen	Celia Rouche	Cheers
Niesha Huffines	Melvin Marxen	Zumgoity
Anna Snelling	Dustin Brinkmann	Bioholding

For extra readability, we can relabel variables/columns using AS. We can also rename whole tables:

SELECT pipeline.sales_agent, 
       teams.manager, 
       pipeline.account
  FROM sales_teams AS teams, sales_pipeline AS pipeline
  WHERE (pipeline.sales_agent = teams.sales_agent) AND 
      pipeline.deal_stage = "Won"

Displaying records 1 - 10
Sales_Agent	manager	Account
Moses Frase	Dustin Brinkmann	Cancity
Darcel Schlecht	Melvin Marxen	Isdom
Darcel Schlecht	Melvin Marxen	Cancity
Moses Frase	Dustin Brinkmann	Codehow
Zane Levy	Summer Sewald	Hatfan
Anna Snelling	Dustin Brinkmann	Ron-tech
Vicki Laflamme	Celia Rouche	J-Texon
Markita Hansen	Celia Rouche	Cheers
Niesha Huffines	Melvin Marxen	Zumgoity
Anna Snelling	Dustin Brinkmann	Bioholding

Note: Even when renaming tables in FROM, we can use those new names in SELECT.

Method 2: Use JOIN keyword

When joining columns that have the same name in both tables, we can use USING:

SELECT pipeline.sales_agent AS Agent, 
      Teams.manager, 
      pipeline.account
  FROM sales_teams AS teams
    JOIN sales_pipeline AS pipeline USING (sales_agent)
  WHERE pipeline.deal_stage = "Won"

Displaying records 1 - 10
Agent	manager	Account
Moses Frase	Dustin Brinkmann	Cancity
Darcel Schlecht	Melvin Marxen	Isdom
Darcel Schlecht	Melvin Marxen	Cancity
Moses Frase	Dustin Brinkmann	Codehow
Zane Levy	Summer Sewald	Hatfan
Anna Snelling	Dustin Brinkmann	Ron-tech
Vicki Laflamme	Celia Rouche	J-Texon
Markita Hansen	Celia Rouche	Cheers
Niesha Huffines	Melvin Marxen	Zumgoity
Anna Snelling	Dustin Brinkmann	Bioholding

If the columns we’re joining by have different names, use ON instead of USING:

SELECT pipeline.sales_agent AS Agent, 
      Teams.manager, 
      pipeline.account
  FROM sales_teams AS teams
    JOIN sales_pipeline AS pipeline 
    ON pipeline.sales_agent = teams.sales_agent
  WHERE pipeline.deal_stage = "Won"

Displaying records 1 - 10
Agent	manager	Account
Moses Frase	Dustin Brinkmann	Cancity
Darcel Schlecht	Melvin Marxen	Isdom
Darcel Schlecht	Melvin Marxen	Cancity
Moses Frase	Dustin Brinkmann	Codehow
Zane Levy	Summer Sewald	Hatfan
Anna Snelling	Dustin Brinkmann	Ron-tech
Vicki Laflamme	Celia Rouche	J-Texon
Markita Hansen	Celia Rouche	Cheers
Niesha Huffines	Melvin Marxen	Zumgoity
Anna Snelling	Dustin Brinkmann	Bioholding

Note: This is still an inner join. It only keep agents that appear in both tibbles.

Outer Joins

Right joins, left joins, and full joins are all examples of outer joins.

LEFT OUTER JOIN
RIGHT OUTER JOIN (not supported by SQLite)
FULL OUTER JOIN (not supported by SQLite)

Example of a Left Outer Join:

SELECT sales_teams.sales_agent AS Agent, 
sales_pipeline.deal_stage
FROM sales_teams 
  LEFT OUTER JOIN sales_pipeline 
  ON sales_teams.sales_agent = sales_pipeline.Sales_Agent
  WHERE Agent = "Carol Thompson"

1 records
Agent	Deal_Stage
Carol Thompson	NA

Note: Carol Thompson has no deals in pipeline and, thus, does not appear in the sales_pipeline table. However, she is included when we do the left outer join as above because she’s in sales_teams.

To do a right join:

If you’re using a more robust dialect of SQL, use RIGHT OUTER JOIN.
In SQLite, just switch order of tables.

SELECT sales_teams.sales_agent AS Agent, 
sales_pipeline.deal_stage
FROM sales_pipeline
  LEFT OUTER JOIN sales_teams
  ON sales_teams.sales_agent = sales_pipeline.Sales_Agent
  WHERE Agent = "Carol Thompson"

0 records
Agent	Deal_Stage

To do a full join:

In a more robust dialect of SQL, use FULL OUTER JOIN.
In SQLite, do the union of a left join and a right join.

Aggregations

SUM
COUNT
AVG
MIN
MAX
CONCAT (concatenation)

These go within the SELECT:

SELECT SUM(Revenue) AS "Total Revenue"
  FROM accounts

1 records
Total Revenue
173358

To count the number of rows, use COUNT(*):

SELECT COUNT(*) AS "Number of Deals Won"
  FROM sales_pipeline
  WHERE deal_stage = "Won"

1 records
Number of Deals Won
4238

Note: Counting of rows happens after WHERE.

Find average closing value of each deal:

SELECT COUNT(*) AS "Deals Won", 
    AVG(close_value) AS "Average Closing Value"
  FROM sales_pipeline
  WHERE deal_stage = "Won"

1 records
Deals Won	Average Closing Value
4238	2360.909

Grouping

SELECT sales_agent AS "Agent", 
    COUNT(*) AS "Deals Won", 
    AVG(close_value) AS "Average Closing Value"
  FROM sales_pipeline
  WHERE deal_stage = "Won"
  GROUP BY sales_agent

Displaying records 1 - 10
Agent	Deals Won	Average Closing Value
Anna Snelling	208	1322.385
Boris Faz	101	2590.406
Cassey Cress	163	2763.736
Cecily Lampkin	107	2147.664
Corliss Cosme	150	2806.907
Daniell Hammack	114	3194.991
Darcel Schlecht	349	3304.338
Donn Cantrell	158	2821.899
Elease Gluck	80	3614.938
Garret Kinder	75	2636.973

Works with joins:

SELECT sales_teams.manager, 
    COUNT(*) AS "Number Deals Won", 
    AVG(sales_pipeline.close_value) AS "Average Closing Value" 
  FROM sales_teams 
    JOIN sales_pipeline ON (sales_teams.sales_agent = sales_pipeline.sales_agent)
  WHERE sales_pipeline.deal_stage = "Won"
  GROUP BY sales_teams.manager

6 records
manager	Number Deals Won	Average Closing Value
Cara Losch	480	2354.269
Celia Rouche	610	2629.339
Dustin Brinkmann	747	1465.011
Melvin Marxen	882	2553.209
Rocco Neubert	691	2837.258
Summer Sewald	828	2372.886

To filter observations before an aggregation occurs (or, within an aggregation), use FILTER(WHERE(…)) within the SELECT.

To only count deals with a closing value over 1000:

SELECT sales_agent, 
   COUNT(close_value) AS total, 
   COUNT(close_value) FILTER(WHERE(close_value > 1000)) AS 'over 1000'
  FROM sales_pipeline
  WHERE deal_stage = "Won" 
  GROUP BY sales_agent

Displaying records 1 - 10
Sales_Agent	total	over 1000
Anna Snelling	208	68
Boris Faz	101	70
Cassey Cress	163	112
Cecily Lampkin	107	63
Corliss Cosme	150	97
Daniell Hammack	114	91
Darcel Schlecht	349	272
Donn Cantrell	158	106
Elease Gluck	80	32
Garret Kinder	75	46

To filter observations after an aggregation occurs, use HAVING. Let’s find sales agents with large deals:

SELECT sales_agent, 
   COUNT(close_value) AS total 
  FROM sales_pipeline
  WHERE deal_stage = "Won" 
  GROUP BY sales_agent
  HAVING COUNT(close_value) > 200

4 records
Sales_Agent	total
Anna Snelling	208
Darcel Schlecht	349
Kary Hendrixson	209
Vicki Laflamme	221

Set Operations

UNION
INTERSECT
MINUS/EXCEPT

To find all accounts, including international:

SELECT intl_accounts.account, 
  intl_accounts.office_location AS location
  FROM intl_accounts

Displaying records 1 - 10
account	location
Ganjaflex	Japan
Genco Pura Olive Oil Company	Italy
Globex Corporation	Norway
Betatech	Kenya
Bioholding	Philipines
Hottechi	Korea
Streethex	Belgium
Sumace	Romania
Zencorporation	China
Sunnamplex	Poland

SELECT accounts.account, 
    "USA" AS Location
  FROM accounts

Displaying records 1 - 10
Account	Location
Acme Corporation	USA
Betasoloin	USA
Betatech	USA
Bioholding	USA
Bioplex	USA
Blackzim	USA
Bluth Company	USA
Bubba Gump	USA
Cancity	USA
Cheers	USA

To do the union:

SELECT intl_accounts.account, 
  intl_accounts.office_location AS location
  FROM intl_accounts
    UNION
SELECT accounts.account, 
    "USA" AS Location
  FROM accounts

Displaying records 1 - 10
account	location
AWOLEX	USA
Acme Corporation	USA
Betasoloin	USA
Betatech	Kenya
Betatech	USA
Bioholding	Philipines
Bioholding	USA
Bioplex	USA
Blackzim	USA
Bluth Company	USA

Note: To do a full join, do a left join, do a right join (by swapping the order of the tables in your join), and then take their union.

To find agents in sale_teams but not in sales_pipeline:

SELECT sales_agent
  FROM sales_teams
    EXCEPT
SELECT sales_agent
  FROM sales_pipeline

5 records
sales_agent
Carl Lin
Carol Thompson
Elizabeth Anderson
Mei-Mei Johns
Natalya Ivanova

Note: This is like a set difference.

Order of Keywords

Keywords must appear in the following order (not all have to appear):

SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY

dbDisconnect(db_sales)

SQL Fundamentals Walkthrough

Chira Levy

2023-02-01

Contents