3. Query fundamentals
3.1 Write a query to list all the film titles
Table(s) to use: film
Expected Output:title | ---------------------------| ACADEMY DINOSAUR | ACE GOLDFINGER | ADAPTATION HOLES | AFFAIR PREJUDICE | AFRICAN EGG | AGENT TRUMAN | AIRPLANE SIERRA | AIRPORT POLLOCK | ... 1000 rows
Show/Hide Solution
select title from film;
3.2 Write a query to return the actor's first names and last names only (with the column headings "First Name" and "Last Name")
Table(s) to use: actor
Expected Output:First Name |Last Name | -----------|------------| PENELOPE |GUINESS | NICK |WAHLBERG | ED |CHASE | JENNIFER |DAVIS | JOHNNY |LOLLOBRIGIDA| BETTE |NICHOLSON | GRACE |MOSTEL | MATTHEW |JOHANSSON | JOE |SWANK | CHRISTIAN |GABLE | ZERO |CAGE | ... 200 rows
Show/Hide Solution
select first_name as "First Name", last_name as "Last Name" from actor;
3.3 How many rows are in the inventory table?
Table(s) to use: inventory
Show/Hide Solution
select * from inventory;
3.4 Write a query that returns all the columns from the actor table without using the * wildcard in the SELECT clause
Table(s) to use: actor
Expected Output:
actor_id|first_name |last_name |last_update |
--------|-----------|------------|-------------------|
1|PENELOPE |GUINESS |2006-02-15 09:34:33|
2|NICK |WAHLBERG |2006-02-15 09:34:33|
3|ED |CHASE |2006-02-15 09:34:33|
4|JENNIFER |DAVIS |2006-02-15 09:34:33|
5|JOHNNY |LOLLOBRIGIDA|2006-02-15 09:34:33|
6|BETTE |NICHOLSON |2006-02-15 09:34:33|
7|GRACE |MOSTEL |2006-02-15 09:34:33|
8|MATTHEW |JOHANSSON |2006-02-15 09:34:33|
9|JOE |SWANK |2006-02-15 09:34:33|
10|CHRISTIAN |GABLE |2006-02-15 09:34:33|
...
200 rows
Show/Hide Solution
select actor_id, first_name, last_name, last_update from actor;
3.5 Write a query to obtain the length of each customer's first name (*remember to look for string functions in the documentation that can help*)
Table(s) to use: customer
Expected Output:first_name |length| -----------|------| MARY | 4| PATRICIA | 8| LINDA | 5| BARBARA | 7| ELIZABETH | 9| JENNIFER | 8| MARIA | 5| SUSAN | 5| ... 599 rows
Show/Hide Solution
select first_name, length(first_name) from customer;
3.6 Write a query to return the initials for each customer
Table(s) to use: customer
Expected Output:first_name |last_name |initial| -----------|------------|-------| MARY |SMITH |MS | PATRICIA |JOHNSON |PJ | LINDA |WILLIAMS |LW | BARBARA |JONES |BJ | ELIZABETH |BROWN |EB | JENNIFER |DAVIS |JD | MARIA |MILLER |MM | SUSAN |WILSON |SW | ... 599 rows
Show/Hide Solution
select first_name, last_name, left(first_name, 1) || left(last_name, 1) as "initial" from customer;
3.7 Each film has a rental_rate, which is how much money it costs for a customer to rent out the film. Each film also has a replacement_cost, which is how much money the film costs to replace. Write a query to figure out how many times each film must be rented out to cover its replacement cost.
Table(s) to use: film
Expected Output:title |rental_rate|replacement_cost|# rentals to break-even| ---------------------------|-----------|----------------|-----------------------| ACADEMY DINOSAUR | 0.99| 20.99| 22| ACE GOLDFINGER | 4.99| 12.99| 3| ADAPTATION HOLES | 2.99| 18.99| 7| AFFAIR PREJUDICE | 2.99| 26.99| 10| AFRICAN EGG | 2.99| 22.99| 8| AGENT TRUMAN | 2.99| 17.99| 7| AIRPLANE SIERRA | 4.99| 28.99| 6| AIRPORT POLLOCK | 4.99| 15.99| 4| ... 1000 rows
Show/Hide Solution
select title, rental_rate, replacement_cost, ceil(replacement_cost / rental_rate) as "# rentals to break-even" from film;
3.8 Write a query to list all the films with a 'G' rating
Table(s) to use: film
Expected Output:title |rating| -------------------------|------| ACE GOLDFINGER |G | AFFAIR PREJUDICE |G | AFRICAN EGG |G | ALAMO VIDEOTAPE |G | AMISTAD MIDSUMMER |G | ANGELS LIFE |G | ANNIE IDENTITY |G | ARMAGEDDON LOST |G | ... 178 rows
Show/Hide Solution
select title, rating from film where rating = 'G';
3.9 List all the films longer than 2 hours (note each film has a length in minutes)
Table(s) to use: film
Expected Output:title |length| ---------------------------|------| AFRICAN EGG | 130| AGENT TRUMAN | 169| ALAMO VIDEOTAPE | 126| ALASKA PHANTOM | 136| ALI FOREVER | 150| ALLEY EVOLUTION | 180| AMERICAN CIRCUS | 129| ANALYZE HOOSIERS | 181| ... 456 rows
Show/Hide Solution
select title, length from film where length > 120;
3.10 Write a query to list all the rentals made before June, 2005
Table(s) to use: rental
Expected Output:
rental_id|rental_date |
---------|-------------------|
2|2005-05-24 22:54:33|
3|2005-05-24 23:03:39|
4|2005-05-24 23:04:41|
5|2005-05-24 23:05:21|
6|2005-05-24 23:08:07|
7|2005-05-24 23:11:53|
8|2005-05-24 23:31:46|
9|2005-05-25 00:00:40|
...
1156 rows
Show/Hide Solution
select rental_id, rental_date from rental where rental_date < '2005-06-01';
3.11 In Exercise 3.7, you wrote a query to figure out how many times each film must be rented out to cover its replacement cost. Now write a query to return only those films that must be rented out more than 30 times to cover their replacement cost.
Table(s) to use: film
Expected Output:title |rental_rate|replacement_cost|# rentals to break-even| ----------------------|-----------|----------------|-----------------------| ARABIA DOGMA | 0.99| 29.99| 31| BALLROOM MOCKINGBIRD | 0.99| 29.99| 31| BONNIE HOLOCAUST | 0.99| 29.99| 31| CLOCKWORK PARADISE | 0.99| 29.99| 31| CLYDE THEORY | 0.99| 29.99| 31| CRUELTY UNFORGIVEN | 0.99| 29.99| 31| EARTH VISION | 0.99| 29.99| 31| EVERYONE CRAFT | 0.99| 29.99| 31| ... 22 rows
Show/Hide Solution
select title, rental_rate, replacement_cost, ceil(replacement_cost / rental_rate) as "# rentals to break-even" from film where ceil(replacement_cost / rental_rate) > 30;
3.12 Write a query to show all rentals made by the customer with ID 388 in 2005
Table(s) to use: rental
Expected Output:
rental_id|rental_date |
---------|-------------------|
21|2005-05-25 01:59:46|
411|2005-05-27 14:14:14|
1276|2005-06-15 08:00:13|
2145|2005-06-17 22:10:36|
2537|2005-06-19 01:52:21|
2692|2005-06-19 13:08:19|
3159|2005-06-20 21:11:50|
4947|2005-07-08 22:49:37|
...
29 rows
Show/Hide Solution
select rental_id, rental_date from rental where rental_date >= '2005-01-01' and rental_date < '2006-01-01' and customer_id = 388;
3.13 We’re trying to list all films with a length of an hour or less. Show two different ways to fix our query below that isn't working (one using the NOT keyword, and one without)
select title, rental_duration, length from film where length not > 60;
Table(s) to use: film
Expected Output:title |rental_duration|length| ---------------------|---------------|------| ACE GOLDFINGER | 3| 48| ADAPTATION HOLES | 7| 50| AIRPORT POLLOCK | 6| 54| ALIEN CENTER | 5| 46| ALTER VICTORY | 6| 57| APOCALYPSE FLAMINGOS | 6| 0| BENEATH RUSH | 6| 53| BLUES INSTINCT | 5| 50| ... 105 rows
Show/Hide Solution
select title, rental_duration, length from film where not length > 60;A solution not using the NOT keyword.
select title, rental_duration, length from film where length <= 60;
3.14 Explain what each of the two queries below are doing and why they generate different results. Which one is probably a mistake and why?
select title, rating from film where rating != 'G' and rating != 'PG';
select title, rating from film where rating != 'G' or rating != 'PG';
Table(s) to use: film
Show/Hide Solution
3.15 Write a single query to show all rentals where the return date is greater than the rental date, or the return date is equal to the rental date, or the return date is less than the rental date. How many rows are returned? Why doesn't this match the number of rows in the table overall?
Table(s) to use: rental
Expected Output:
rental_id|rental_date |return_date |
---------|-------------------|-------------------|
2|2005-05-24 22:54:33|2005-05-28 19:40:33|
3|2005-05-24 23:03:39|2005-06-01 22:12:39|
4|2005-05-24 23:04:41|2005-06-03 01:43:41|
5|2005-05-24 23:05:21|2005-06-02 04:33:21|
6|2005-05-24 23:08:07|2005-05-27 01:32:07|
7|2005-05-24 23:11:53|2005-05-29 20:34:53|
8|2005-05-24 23:31:46|2005-05-27 23:33:46|
9|2005-05-25 00:00:40|2005-05-28 00:22:40|
...
15861 rows
Show/Hide Solution
select rental_id, rental_date, return_date from rental where return_date > rental_date or return_date = rental_date or return_date < rental_date;
3.16 Write a query to list the rentals that haven't been returned
Table(s) to use: rental
Expected Output:
rental_id|return_date|
---------|-----------|
11496| |
11541| |
12101| |
11563| |
11577| |
11593| |
11611| |
11646| |
...
183 rows
Show/Hide Solution
select rental_id, return_date from rental where return_date is null;
3.17 Write a query to list the films that have a rating that is not 'G' or 'PG'
Table(s) to use: film
Expected Output:title |rating| ---------------------------|------| ADAPTATION HOLES |NC-17 | AIRPLANE SIERRA |PG-13 | AIRPORT POLLOCK |R | ALABAMA DEVIL |PG-13 | ALADDIN CALENDAR |NC-17 | DATE SPEED |R | ALICE FANTASIA |NC-17 | ALIEN CENTER |NC-17 | ... 629 rows
Show/Hide Solution
select title, rating from film where rating != 'PG' and rating != 'G' or rating is null;
3.18 Write a query to return the films with a rating of 'PG', 'G', or 'PG-13'
Table(s) to use: film
Expected Output:title |rating| ---------------------------|------| ACADEMY DINOSAUR |PG | ACE GOLDFINGER |G | AFFAIR PREJUDICE |G | AFRICAN EGG |G | AGENT TRUMAN |PG | AIRPLANE SIERRA |PG-13 | ALABAMA DEVIL |PG-13 | ALAMO VIDEOTAPE |G | ... 593 rows
Show/Hide Solution
select title, rating
from film
where rating in ('PG', 'G', 'PG-13');
3.19 Write a query equivalent to the one below using BETWEEN.
select title, length from film where length >= 90 and length <= 120;
Table(s) to use: film
Expected Output:title |length| ----------------------|------| AFFAIR PREJUDICE | 117| ALABAMA DEVIL | 114| DATE SPEED | 104| ALICE FANTASIA | 94| AMADEUS HOLY | 113| ANACONDA CONFESSIONS | 92| ANTHEM LUKE | 91| APACHE DIVINE | 92| ... 222 rows
Show/Hide Solution
select title, length from film where length between 90 and 120;
3.20 Write a query to return all film titles that end with the word "GRAFFITI"
Table(s) to use: film
Expected Output:title | ------------------| CLUB GRAFFITI | HUMAN GRAFFITI | STRANGERS GRAFFITI| 3 rows
Show/Hide Solution
select title from film where title like '%GRAFFITI';
3.21 In exercise 3.17 you wrote a query to list the films that have a rating that is not 'G' or 'PG'. Re-write this query using NOT IN. Do your results include films with a NULL rating?
Table(s) to use: film
Show/Hide Solution
select title, rating
from film
where rating not in ('G', 'PG');
3.22 Write a query to list all the customers with an email address. Order the customers by last name descending
Table(s) to use: customer
Expected Output:first_name |last_name | -----------|------------| CYNTHIA |YOUNG | MARVIN |YEE | LUIS |YANEZ | BRIAN |WYMAN | BRENDA |WRIGHT | TYLER |WREN | FLORENCE |WOODS | LORI |WOOD | ... 597 rows
Show/Hide Solution
select first_name, last_name from customer where email is not null order by last_name desc;
3.23 Write a query to list the country id's and cities from the city table, first ordered by country id ascending, then by city alphabetically.
Table(s) to use: city
Expected Output:
country_id|city |
----------|--------------------------|
1|Kabul |
2|Batna |
2|Bchar |
2|Skikda |
3|Tafuna |
4|Benguela |
4|Namibe |
5|South Hill |
6|Almirante Brown |
6|Avellaneda |
6|Baha Blanca |
6|Crdoba |
6|Escobar |
6|Ezeiza |
...
600 rows
Show/Hide Solution
select country_id, city from city order by country_id, city;
3.24 Write a query to list actors ordered by the length of their full name ("[first_name] [last_name]") descending.
Table(s) to use: actor
Expected Output:full_name |len| --------------------|---| MICHELLE MCCONAUGHEY| 20| JOHNNY LOLLOBRIGIDA | 19| ANGELA WITHERSPOON | 18| MATTHEW JOHANSSON | 17| CAMERON ZELLWEGER | 17| JAYNE SILVERSTONE | 17| CHRISTOPHER BERRY | 17| FRANCES DAY-LEWIS | 17| ... 200 rows
Show/Hide Solution
select first_name || ' ' || last_name as full_name, length(first_name || ' ' || last_name) as len from actor order by len desc;
3.25 Describe the difference between ORDER BY x, y DESC and ORDER BY x DESC, y DESC (where x and y are columns in some imaginary table you're querying)
Table(s) to use: imaginary!
Show/Hide Solution
3.26 Fix the query below, which we wanted to use to list all the rentals that happened after 10pm at night.
select rental_id, date_part('hour', rental_date) as "rental hour"
from rental
where "rental hour" >= 22;
Table(s) to use: rental
Expected Output:
rental_id|rental hour|
---------|-----------|
2| 22|
3| 23|
4| 23|
5| 23|
6| 23|
7| 23|
8| 23|
136| 22|
...
1252 rows
Show/Hide Solution
select rental_id, date_part('hour', rental_date) as "rental hour"
from rental
where date_part('hour', rental_date) >= 22;
3.27 Write a query to return the 3 most recent payments received
Table(s) to use: payment
Expected Output:
payment_id|payment_date |
----------|-------------------|
31918|2007-05-14 13:44:29|
31917|2007-05-14 13:44:29|
31919|2007-05-14 13:44:29|
3 rows
Show/Hide Solution
select payment_id, payment_date from payment order by payment_date desc limit 3;
3.28 Return the 4 films with the shortest length that are not R rated. For films with the same length, order them alphabetically
Table(s) to use: film
Expected Output:title |length|rating| ----------------|------|------| ALIEN CENTER | 46|NC-17 | IRON MOON | 46|PG | KWAI HOMEWARD | 46|PG-13 | LABYRINTH LEAGUE| 46|PG-13 | 4 rows
Show/Hide Solution
select title, length, rating from film where rating != 'R' or rating is null order by length, title limit 4;
3.29 Write a query to return the last 3 payments made in January, 2007
Table(s) to use: payment
Expected Output:
payment_id|amount|payment_date |
----------|------|-------------------|
16813| 0.99|2007-01-31 21:16:11|
16890| 0.99|2007-01-31 21:06:00|
17173| 2.99|2007-01-31 20:45:37|
3 rows
Show/Hide Solution
select payment_id, amount, payment_date from payment where payment_date < '2007-02-01' and payment_date >= '2007-01-01' order by payment_date desc limit 3;
3.30 Can you think of a way you could, as in the previous exercise, return the last 3 payments made in January, 2007 but have those same 3 output rows ordered by date ascending? (Don't spend too long on this...)
Table(s) to use: payment
Expected Output:
payment_id|amount|payment_date |
----------|------|-------------------|
17173| 2.99|2007-01-31 20:45:37|
16890| 0.99|2007-01-31 21:06:00|
16813| 0.99|2007-01-31 21:16:11|
3 rows
Show/Hide Solution
3.31 Write a query to return all the unique ratings films can have, ordered alphabetically (not including NULL)
Table(s) to use: film
Expected Output:rating| ------| G | PG | PG-13 | R | NC-17 | 5 rows
Show/Hide Solution
select distinct rating from film where rating is not null order by rating;
3.32 Write a query to help us quickly see if there is any hour of the day that we have never rented a film out on (maybe the staff always head out for lunch?)
Table(s) to use: rental
Show/Hide Solution
select distinct date_part('hour', rental_date) as hr
from rental
order by hr;
3.33 Write a query to help quickly check whether the same rental rate is used for each rental duration (for example - is the rental rate always 4.99 when the rental duration is 3?)
Table(s) to use: film
Expected Output:
rental_duration|rental_rate|
---------------|-----------|
3| 4.99|
3| 2.99|
3| 0.99|
4| 2.99|
4| 4.99|
4| 0.99|
5| 2.99|
5| 4.99|
...
15 rows
Show/Hide Solution
select distinct rental_duration, rental_rate from film order by rental_duration;
3.34 Can you explain why the first query below works, but the second one, which simply adds the DISTINCT keyword, doesn't? (this is quite challenging)
select first_name from actor order by last_name;
select distinct first_name from actor order by last_name;
Table(s) to use: actor
Show/Hide Solution
3.35 Write a query to return an ordered list of distinct ratings for films in our films table along with their descriptions (you will have to type in the descriptions yourself)
Table(s) to use: film
Expected Output:rating|rating description | ------|-----------------------------| G |General | PG |Parental Guidance Recommended| PG-13 |Parents Strongly Cautioned | R |Restricted | NC-17 |Adults Only | 5 rows
Show/Hide Solution
select distinct rating,
case rating
when 'G' then 'General'
when 'PG' then 'Parental Guidance Recommended'
when 'PG-13' then 'Parents Strongly Cautioned'
when 'R' then 'Restricted'
when 'NC-17' then 'Adults Only'
end as "rating description"
from film
where rating is not null
order by rating;
3.36 Write a query to output 'Returned' for returned rentals and 'Not Returned' for rentals that haven't been returned. Order the output to show those not returned first.
Table(s) to use: rental
Expected Output:
rental_id|rental_date |return_date |return_status|
---------|-------------------|-------------------|-------------|
14503|2006-02-14 15:16:03| |Not Returned |
14172|2006-02-14 15:16:03| |Not Returned |
12988|2006-02-14 15:16:03| |Not Returned |
14488|2006-02-14 15:16:03| |Not Returned |
15655|2006-02-14 15:16:03| |Not Returned |
14204|2006-02-14 15:16:03| |Not Returned |
12130|2006-02-14 15:16:03| |Not Returned |
12970|2006-02-14 15:16:03| |Not Returned |
...
16044 rows
Show/Hide Solution
select rental_id, rental_date, return_date,
case
when return_date is null then 'Not Returned'
else 'Returned'
end as return_status
from rental
order by return_status;
3.37 Imagine you were asked to write a query to populate a 'country picker' for some internal company dashboard. Write a query to return the countries in alphabetical order, but also with the twist that the first 3 countries in the list must be 1) Australia 2) United Kingdom 3) United States and then normal alphabetical order after that (maybe you want them first because, for example, most of your customers come from these countries)
Table(s) to use: country
Expected Output:country | -------------------------------------| Australia | United Kingdom | United States | Afghanistan | Algeria | American Samoa | Angola | Anguilla | ... 109 rows
Show/Hide Solution
select country
from country
order by
case country
when 'Australia' then 0
when 'United Kingdom' then 1
when 'United States' then 2
else 3
end, country;
3.38 We want to give a prize to 5 random customers each month. Write a query that will return 5 random customers each time it is run (you may find a particular math function helpful - make sure to search the documentation!)
Table(s) to use: customer
Expected Output:Your output should list a different set of customers each time you run the query!first_name|last_name|email | ----------|---------|-----------------------------------| DUANE |TUBBS |DUANE.TUBBS@sakilacustomer.org | OLGA |JIMENEZ |OLGA.JIMENEZ@sakilacustomer.org | ROSEMARY |SCHMIDT |ROSEMARY.SCHMIDT@sakilacustomer.org| PAMELA |BAKER |PAMELA.BAKER@sakilacustomer.org | BILL |GAVIN |BILL.GAVIN@sakilacustomer.org | 5 rows
Show/Hide Solution
select first_name, last_name, email from customer order by random() limit 5;
3.39 Give 3 different solutions to list the rentals made in June, 2005. In one solution, use the date_part function. In another, use the BETWEEN keyword. In the third, don't use either date_part or BETWEEN.
Table(s) to use: rental
Expected Output:
rental_id|rental_date |
---------|-------------------|
1158|2005-06-14 22:53:33|
1159|2005-06-14 22:55:13|
1160|2005-06-14 23:00:34|
1161|2005-06-14 23:07:08|
1162|2005-06-14 23:09:38|
1163|2005-06-14 23:12:46|
1164|2005-06-14 23:16:26|
1165|2005-06-14 23:16:27|
...
2311 rows
Show/Hide Solution
select rental_id, rental_date
from rental
where date_part('year', rental_date) = 2005
and date_part('month', rental_date) = 6;
Using the BETWEEN keyword.
select rental_id, rental_date from rental where rental_date between '2005-06-01 00:00:00' and '2005-06-30 24:00:00';Instead of BETWEEN you can also use the standard greater than and less than operators.
select rental_id, rental_date from rental where rental_date >= '2005-06-01 00:00:00' and rental_date <= '2005-06-30 24:00:00';
3.40 Return the top 5 films for $ per minute (rental_rate / length) of entertainment
Table(s) to use: film
Expected Output:title |rental_rate|length|per_minute | -------------------|-----------|------|----------------------| IRON MOON | 4.99| 46|0.10847826086956521739| HANOVER GALAXY | 4.99| 47|0.10617021276595744681| MIDSUMMER GROUNDHOG| 4.99| 48|0.10395833333333333333| ACE GOLDFINGER | 4.99| 48|0.10395833333333333333| PELICAN COMFORTS | 4.99| 48|0.10395833333333333333| 5 rows
Show/Hide Solution
select title, rental_rate, length, rental_rate / length as per_minute from film where length is not null and length != 0 order by per_minute desc limit 5;
3.41 Write a query to list all customers who have a first name containing the letter 'A' twice or more
Table(s) to use: customer
Expected Output:first_name| ----------| PATRICIA | BARBARA | MARIA | MARGARET | SANDRA | LAURA | SARAH | ANGELA | ... 56 rows
Show/Hide Solution
select first_name from customer where first_name ilike '%a%a%';
3.42 PostgreSQL supports an interesting variation of DISTINCT called DISTINCT ON. Visit the official documentation page and read about DISTINCT ON. See if you can figure out how you would use it in a query to return the most recent rental for each customer
Table(s) to use: rental
Expected Output:
customer_id|rental_date |
-----------|-------------------|
1|2005-08-22 20:03:46|
2|2005-08-23 17:39:35|
3|2005-08-23 07:10:14|
4|2005-08-23 07:43:00|
5|2006-02-14 15:16:03|
6|2005-08-23 06:41:32|
7|2005-08-21 04:49:48|
8|2005-08-23 14:31:19|
9|2006-02-14 15:16:03|
10|2005-08-22 21:59:29|
...
599 rows
Show/Hide Solution
select distinct on (customer_id) customer_id, rental_date from rental order by customer_id asc, rental_date desc;
3.43 Write a query to list all the customers with an email address but not in the format [first_name].[last_name]@sakilacustomer.org
Table(s) to use: customer
Expected Output:first_name|last_name|email | ----------|---------|--------------------------------| JEAN |BELL |JEAN.BELL@sakilacustomer.or | BONNIE |HUGHES |BONNIE.HUGHES@sabilacustomer.org| 2 rows
Show/Hide Solution
select first_name, last_name, email from customer where email != (first_name || '.' || last_name || '@sakilacustomer.org')
4. Aggregate functions and grouping
4.1 Write a query to return the total count of customers in the customer table and the count of how many customers provided an email address
Table(s) to use: customer
Expected Output:
# customers|# customers with email|
-----------|----------------------|
599| 597|
1 row
Show/Hide Solution
select count(*) as "# customers", count(email) as "# customers with email" from customer;
4.2 Building on the previous exercise, now return an additional result showing the percentage of customers with an email address (as a helpful hint, if you're getting 0 try multiplying the fraction by 100.0 - we'll examine why this is necessary in an upcoming chapter on data types)
Table(s) to use: customer
Expected Output:
# customers|# customers with email|% with email |
-----------|----------------------|-------------------|
599| 597|99.6661101836393990|
1 row
Show/Hide Solution
select count(*) as "# customers", count(email) as "# customers with email", 100.0 * count(email) / count(*) as "% with email" from customer;
4.3 Write a query to return the number of distinct customers who have made payments
Table(s) to use: payment
Expected Output:count| -----| 599| 1 row
Show/Hide Solution
select count(distinct customer_id) from payment;
4.4 What is the average length of time films are rented out for
Table(s) to use: rental
Expected Output:avg rental duration | ----------------------| 4 days 24:36:28.541706| 1 row
Show/Hide Solution
select avg(return_date - rental_date) as "avg rental duration" from rental;
4.5 Write a query to return the sum total of all payment amounts received
Table(s) to use: payment
Expected Output:total | --------| 67416.51| 1 row
Show/Hide Solution
select sum(amount) as total from payment;
4.6 List the number of films each actor has appeared in and order the results from most popular to least
Table(s) to use: film_actor
Expected Output:
actor_id|num_films|
--------|---------|
107| 42|
102| 41|
198| 40|
181| 39|
23| 37|
81| 36|
158| 35|
60| 35|
...
200 rows
Show/Hide Solution
select actor_id, count(*) as num_films from film_actor group by actor_id order by num_films desc;
4.7 List the customers who have made over 40 rentals
Table(s) to use: rental
Expected Output:
customer_id|
-----------|
75|
144|
526|
236|
148|
5 rows
Show/Hide Solution
select customer_id from rental group by customer_id having count(*) > 40;
4.8 We want to compare how the staff are performing against each other on a month to month basis. So for each month and year, show for each staff member how many payments they handled, the total amount of money they accepted, and the average payment amount
Table(s) to use: payment
Expected Output:year|month|staff_id|num_payments|payment_total|avg_payment_amount| ----|-----|--------|------------|-------------|------------------| 2007| 1| 1| 617| 2621.83|4.2493192868719611| 2007| 1| 2| 540| 2202.60|4.0788888888888889| 2007| 2| 1| 1164| 4776.36|4.1034020618556701| 2007| 2| 2| 1148| 4855.52|4.2295470383275261| 2007| 3| 1| 2817| 11776.83|4.1806283280085197| 2007| 3| 2| 2827| 12109.73|4.2835974531305271| 2007| 4| 1| 3364| 14080.36|4.1856004756242568| 2007| 4| 2| 3390| 14479.10|4.2711209439528024| 2007| 5| 1| 95| 234.09|2.4641052631578947| 2007| 5| 2| 87| 280.09|3.2194252873563218| 10 rows
Show/Hide Solution
select
date_part('year', payment_date) as year,
date_part('month', payment_date) as month,
staff_id,
count(*) as num_payments,
sum(amount) as payment_total,
avg(amount) as avg_payment_amount
from payment
group by
date_part('year', payment_date),
date_part('month', payment_date),
staff_id
order by year, month, staff_id;
4.9 Write a query to show the number of rentals that were returned within 3 days, the number returned in 3 or more days, and the number never returned (for the logical comparison check you can use the following code snippet to compare against an interval: where return_date - rental_date < interval '3 days')
Table(s) to use: rental
Expected Output:
lt 3 days|gt 3 days|never returned|
---------|---------|--------------|
4388| 11473| 183|
1 row
Show/Hide Solution
select
count(*) filter
(where return_date - rental_date < interval '3 days') as "lt 3 days",
count(*) filter
(where return_date - rental_date >= interval '3 days') as "gt 3 days",
count(*) filter
(where return_date is null) as "never returned"
from rental;
4.10 Write a query to give counts of the films by their length in groups of 0 - 1hrs, 1 - 2hrs, 2 - 3hrs, and 3hrs+ (note: you might get slightly different numbers if doing inclusive or exclusive grouping - but don't sweat it!)
Table(s) to use: film
Expected Output:len |count| ------|-----| 0-1hrs| 97| 1-2hrs| 436| 2-3hrs| 419| 3hrs+ | 48| 4 rows
Show/Hide Solution
select
case
when length between 0 and 59 then '0-1hrs'
when length between 60 and 119 then '1-2hrs'
when length between 120 and 179 then '2-3hrs'
else '3hrs+'
end as len,
count(*)
from film
group by 1
order by 1;
4.11 Explain why in the following query we obtain two different results for the average film length. Which one is correct?
select 1.0 * sum(length) / count(*) as avg1, 1.0 * avg(length) as avg2 from film;
Table(s) to use: film
Expected Output:avg1 |avg2 | --------------------|---------------------| 114.8850000000000000|115.11523046092184370|
Show/Hide Solution
4.12 Write a query to return the average rental duration for each customer in descending order
Table(s) to use: rental
Expected Output:
customer_id|avg_rent_duration |
-----------|----------------------|
315| 6 days 14:13:22.5|
187|5 days 34:58:38.571428|
321|5 days 32:56:32.727273|
539|5 days 31:39:57.272727|
436| 5 days 31:09:46|
532|5 days 30:59:34.838709|
427| 5 days 29:27:05|
555|5 days 26:48:35.294118|
...
599 rows
Show/Hide Solution
select customer_id, avg(return_date - rental_date) as avg_rent_duration from rental group by customer_id order by avg_rent_duration desc;
4.13 Return a list of customer where all payments they’ve made have been over $2 (lookup the bool_and aggregate function which will be useful here)
Table(s) to use: payment
Expected Output:
customer_id|
-----------|
363|
59|
2 rows
Show/Hide Solution
select customer_id from payment group by customer_id having bool_and(amount > 2);
4.14 As a final fun finish to this chapter, run the following query to see a cool way you can generate ascii histogram charts. Look up the repeat function (you'll find it under 'String Functions and Operators') to see how it works and change the output character...and don't worry, I'll explain the ::int bit in the next chapter!
select rating, repeat('*', (count(*) / 10)::int) as "count/10"
from film
where rating is not null
group by rating;
Table(s) to use: film
Expected Output:rating|count/10 | ------|----------------------| PG-13 |**********************| R |******************* | G |***************** | PG |******************* | NC-17 |******************** | 5 rows
Show/Hide Solution
5. Understanding data types
5.1 Write a query to print a description of each film's length as shown in the output below. When a film does not have a length, print: [title] is unknown length
Table(s) to use: film
Expected Output:length_desc | ------------------------------------------| ACADEMY DINOSAUR is 86 minutes | ACE GOLDFINGER is 48 minutes | ADAPTATION HOLES is 50 minutes | AFFAIR PREJUDICE is 117 minutes | AFRICAN EGG is 130 minutes | AGENT TRUMAN is 169 minutes | AIRPLANE SIERRA is 62 minutes | AIRPORT POLLOCK is 54 minutes | ... 1000 rows
Show/Hide Solution
select title || ' is ' || coalesce(length || ' minutes', 'unknown length') as length_desc from film;
5.2 You want to play a movie title guessing game with some friends. Write a query to print only the first 3 letters of each film title and then '*' for the rest (The repeat function may come in handy here...)
Table(s) to use: film
Expected Output:Guess! | ---------------------------| ACA************* | ACE*********** | ADA************* | AFF************* | AFR******** | AGE********* | AIR************ | AIR************ | ... 1000 rows
Show/Hide Solution
select left(title, 3) || repeat('*', length(title) - 3) as "Guess!"
from film;
5.3 Write a query to list the percentage of films that are rated NC-17, G, PG, PG-13, NC-17, and R, rounded to the nearest integer.
Table(s) to use: film
Expected Output:
% NC-17|% PG|% G|% R|% PG-13|
-------|----|---|---|-------|
21| 19| 18| 20| 22|
1 row
Show/Hide Solution
select round(100.0 * count(*) filter(where rating = 'NC-17') / count(*)) as "% NC-17", round(100.0 * count(*) filter(where rating = 'PG') / count(*)) as "% PG", round(100.0 * count(*) filter(where rating = 'G') / count(*)) as "% G", round(100.0 * count(*) filter(where rating = 'R') / count(*)) as "% R", round(100.0 * count(*) filter(where rating = 'PG-13') / count(*)) as "% PG-13" from film;
5.4 Try a few of the different explicit casting operations listed below to get familiar with how casting behaves. Was the behaviour what you expected?
select int '33'; select int '33.3'; select cast(33.3 as int); select cast(33.8 as int); select 33::text; select 'hello'::varchar(2); select cast(35000 as smallint); select 12.1::numeric(1,1);
Show/Hide Solution
5.5 Show 3 different ways to input the timestamptz representing 4th March, 2019 at 3:30pm in New York, USA
select * from pg_timezone_names;
select * from pg_timezone_abbrevs;
Show/Hide Solution
select timestamptz '2019-03-04 15:30 EST', timestamptz '2019-03-04 03:30PM -5', timestamptz '2019-03-04 03:30PM America/New_York';
5.6 The rental duration in the film table is currently stored as an integer, representing the number of days. Write a query to return this as an interval instead and then add one day to the duration
Table(s) to use: film
Expected Output:title |duration|duration + 1| ---------------------------|--------|------------| ACADEMY DINOSAUR | 6 days| 7 days| ACE GOLDFINGER | 3 days| 4 days| ADAPTATION HOLES | 7 days| 8 days| AFFAIR PREJUDICE | 5 days| 6 days| AFRICAN EGG | 6 days| 7 days| AGENT TRUMAN | 3 days| 4 days| AIRPLANE SIERRA | 6 days| 7 days| AIRPORT POLLOCK | 6 days| 7 days| ... 1000 rows
Show/Hide Solution
select title, cast(rental_duration || ' days' as interval) as duration, cast(rental_duration || ' days' as interval) + interval '1 day' as "duration + 1" from film;
5.7 You have a theory that certain hours of the day might be busiest for rentals. To investigate this write a query to list out for all time the the number of rentals made during each hour of the day
Table(s) to use: rental
Expected Output:hr|count| --|-----| 0| 694| 1| 649| 2| 630| 3| 684| 4| 681| 5| 648| 6| 647| 7| 667| ... 24 rows
Show/Hide Solution
select
date_part('hour', rental_date) as hr,
count(*)
from rental
group by hr
order by hr;
5.8 If you wanted to aggregate payments received by year and month you could write a query as below using the date_part function. Try and simplify this query by instead using the date_trunc function to achieve effectively the same result (ignoring the slight difference in date presentation)
select
date_part('year', payment_date) as "year",
date_part('month', payment_date) as "month",
sum(amount) as total
from payment
group by "year", "month"
order by "year", "month";
Table(s) to use: payment
Expected Output:month |total | -------------------|--------| 2007-01-01 00:00:00| 4824.43| 2007-02-01 00:00:00| 9631.88| 2007-03-01 00:00:00|23886.56| 2007-04-01 00:00:00|28559.46| 2007-05-01 00:00:00| 514.18| 5 rows
Show/Hide Solution
select
date_trunc('month', payment_date) as "month",
sum(amount) as total
from payment
group by "month"
order by "month";
5.9 Write a query to return a count of the number of films that were rented out on the last day of a month
Table(s) to use: rental
Expected Output:
total # EOM rentals|
-------------------|
842|
Show/Hide Solution
select count(*) as "total # EOM rentals"
from rental
where date_trunc('month', rental_date) + interval '1 month' - interval '1 day'
= date_trunc('day', rental_date);
5.10 Write a query to list the film titles that have spaces at the beginning or end (we want to flag them so we know to clean them up!)
Table(s) to use: film
Expected Output:0 rows
Show/Hide Solution
select title from film where length(title) - length(trim(title)) > 0;
5.11 Write a query to sum up, for each customer, the total number of hours they have had films rented out for. Return only the top 3 customers with the most hours.
Table(s) to use: rental
Expected Output:
customer_id|hrs_rented|
-----------|----------|
526| 6340|
148| 5834|
144| 5641|
3 rows
Show/Hide Solution
select
customer_id,
round(sum(date_part('epoch', return_date - rental_date)) / 3600) as hrs_rented
from rental
group by customer_id
order by hrs_rented desc
limit 3;
5.12 Postgres has a really useful function called generate_series which will come in handy in a few of the coming chapters. Have a look at the examples how to use it here and then write a query to generate a list of timestamps which represent the first day of every month in 2019, at 5pm UTC
Expected Output:generate_series | -------------------| 2019-01-02 03:00:00| 2019-02-02 03:00:00| 2019-03-02 03:00:00| 2019-04-02 03:00:00| 2019-05-02 03:00:00| 2019-06-02 03:00:00| 2019-07-02 03:00:00| 2019-08-02 03:00:00| 2019-09-02 03:00:00| 2019-10-02 03:00:00| 2019-11-02 03:00:00| 2019-12-02 03:00:00| 12 rows
Show/Hide Solution
select * from generate_series( timestamptz '2019-01-01 17:00 UTC', timestamptz '2019-12-01 17:00 UTC', interval '1 month');
5.13 Return a count of the number of occurrences of the letter 'A' in each customer's first name (this is a common interview question for SQL related jobs!). Order the output by the count descending.
Table(s) to use: customer
Expected Output:first_name |count| -----------|-----| CASSANDRA | 3| BARBARA | 3| SAMANTHA | 3| TAMARA | 3| AMANDA | 3| MARSHALL | 2| MARSHA | 2| MARIAN | 2| ... 599 rows
Show/Hide Solution
select first_name, length(first_name) - length(replace(first_name, 'A', '')) as count from customer order by count desc;
5.14 Write a query to tally up the total amount of money made on weekends (Saturday and Sunday)
Table(s) to use: payment
Expected Output:total $ | --------| 19036.04| 1 row
Show/Hide Solution
select sum(amount) as "total $"
from payment
where date_part('isodow', payment_date) in (6, 7);
6. Joining data from multiple tables
6.1 Write a query to return a list of all the films rented by PETER MENARD showing the most recent first
Table(s) to use: rental, customer, inventory, film
Expected Output:rental_date |title | -------------------|---------------------| 2005-08-23 18:43:31|OCTOBER SUBMARINE | 2005-08-22 09:10:21|PRIDE ALAMO | 2005-08-21 08:40:56|FUGITIVE MAGUIRE | 2005-08-20 12:53:46|BRIGHT ENCOUNTERS | 2005-08-19 10:06:53|REDEMPTION COMFORTS | 2005-08-19 09:42:01|SPOILERS HELLFIGHTERS| 2005-08-17 07:56:22|GO PURPLE | 2005-08-02 15:44:55|GOODFELLAS SALUTE | ... 23 rows
Show/Hide Solution
select r.rental_date, f.title
from rental as r
inner join customer as c
on r.customer_id = c.customer_id
inner join inventory as i
on r.inventory_id = i.inventory_id
inner join film as f
on i.film_id = f.film_id
where c.first_name = 'PETER'
and c.last_name = 'MENARD'
order by r.rental_date desc;
6.2 Write a query to list the full names and contact details for the manager of each store
Table(s) to use: store, staff
Expected Output:
store_id|Manager |email |
--------|------------|----------------------------|
1|Mike Hillyer|Mike.Hillyer@sakilastaff.com|
2|Jon Stephens|Jon.Stephens@sakilastaff.com|
2 rows
Show/Hide Solution
select
store.store_id,
staff.first_name || ' ' || staff.last_name as "Manager",
staff.email
from store
inner join staff
on store.manager_staff_id = staff.staff_id;
6.3 Write a query to return the top 3 most rented out films and how many times they've been rented out
Table(s) to use: rental, inventory, film
Expected Output:
film_id|title |count|
-------|------------------|-----|
103|BUCKET BROTHERHOOD| 34|
738|ROCKETEER MOTHER | 33|
331|FORWARD TEMPLE | 32|
3 rows
Show/Hide Solution
select f.film_id, f.title, count(*)
from rental as r
inner join inventory as i
on r.inventory_id = i.inventory_id
inner join film as f
on i.film_id = f.film_id
group by f.film_id, f.title
order by count(*) desc
limit 3;
6.4 Write a query to show for each customer how many different (unique) films they've rented and how many different (unique) actors they've seen in films they've rented
Table(s) to use: rental, inventory, film, film_actor
Expected Output:
customer_id|num_films|num_actors|
-----------|---------|----------|
1| 30| 108|
2| 27| 118|
3| 26| 103|
4| 22| 89|
5| 38| 119|
6| 26| 110|
7| 32| 117|
8| 23| 109|
...
599 rows
Show/Hide Solution
select r.customer_id, count(distinct f.film_id) as num_films, count(distinct fa.actor_id) as num_actors from rental as r inner join inventory as i using (inventory_id) inner join film as f using (film_id) inner join film_actor as fa using (film_id) group by r.customer_id order by r.customer_id;
6.5 Re-write the query below written in the older style of inner joins (which you still encounter surprisingly often online) using the more modern style. Re-write it once using ON to establish the join condition and the second time with USING.
select film.title, language.name as "language" from film, language where film.language_id = language.language_id;
Table(s) to use: film, language
Expected Output:title |language | ---------------------------|--------------------| ACADEMY DINOSAUR |English | ACE GOLDFINGER |English | ADAPTATION HOLES |English | AFFAIR PREJUDICE |English | AFRICAN EGG |English | AGENT TRUMAN |English | AIRPLANE SIERRA |English | AIRPORT POLLOCK |English | ... 1000 rows
Show/Hide Solution
select film.title, language.name as "language"
from film
inner join language
on film.language_id = language.language_id;
select film.title, language.name as "language" from film inner join language using (language_id);
6.6 Write a query to list the films that are not in stock in any of the stores
Table(s) to use: film, inventory
Expected Output:title | ----------------------| SKY MIRACLE | KILL BROTHERHOOD | SISTER FREDDY | GLADIATOR WESTWARD | FLOATS GARDEN | APOLLO TEEN | CRYSTAL BREAKING | HOCUS FRIDA | ... 42 rows
Show/Hide Solution
select f.title
from film as f
left outer join inventory as i
on f.film_id = i.film_id
where i.store_id is null;
6.7 Write a query to return a count of how many of each film we have in our inventory (include all films). Order the output showing the lowest in-stock first so we know to buy more!
Table(s) to use: film, inventory
Expected Output:title |count| ---------------------------|-----| CHOCOLATE DUCK | 0| FRANKENSTEIN STRANGER | 0| APOLLO TEEN | 0| HOCUS FRIDA | 0| WAKE JAWS | 0| SUICIDES SILENCE | 0| GUMP DATE | 0| BUTCH PANTHER | 0| ... 1000 rows
Show/Hide Solution
select f.title, count(i.inventory_id)
from film as f
left outer join inventory as i
on f.film_id = i.film_id
group by f.film_id, f.title
order by count(i.inventory_id) asc;
6.8 Write a query to return a count of the number of films rented by every customer on the 24th May, 2005. Order the results by number of films rented descending.
Table(s) to use: customer, rental
Expected Output:
customer_id|num_rented|
-----------|----------|
130| 1|
222| 1|
239| 1|
269| 1|
333| 1|
408| 1|
459| 1|
549| 1|
1| 0|
2| 0|
3| 0|
4| 0|
5| 0|
...
599 rows
Show/Hide Solution
select
c.customer_id,
count(r.rental_id) as num_rented
from customer as c
left join rental as r
on c.customer_id = r.customer_id
and date_trunc('day', r.rental_date) = '20050524'
group by c.customer_id
order by num_rented desc, c.customer_id;
6.9 Write a query to return how many copies of each film are available in each store, including zero counts if there are none. Order by count so we can easily see first which films need to be restocked in each store
Table(s) to use: film, store, inventory
Expected Output:
film_id|store_id|stock|
-------|--------|-----|
2| 1| 0|
3| 1| 0|
5| 1| 0|
8| 1| 0|
13| 1| 0|
14| 1| 0|
14| 2| 0|
20| 2| 0|
...
2000 rows
Show/Hide Solution
select f.film_id, s.store_id, count(i.inventory_id) as stock
from film as f
cross join store as s
left join inventory as i
on f.film_id = i.film_id
and s.store_id = i.store_id
group by f.film_id, s.store_id
order by stock, f.film_id, s.store_id;
6.10 Have a look at the documentation for the extremely useful Postgres function generate_series and then using generate_series write a query to return a count of the number of rentals for each and every month in 2005 (don't worry too much about the output date formatting).
Table(s) to use: rental
Expected Output:t |count| -------------------|-----| 2005-01-01 00:00:00| 0| 2005-02-01 00:00:00| 0| 2005-03-01 00:00:00| 0| 2005-04-01 00:00:00| 0| 2005-05-01 00:00:00| 1156| 2005-06-01 00:00:00| 2311| 2005-07-01 00:00:00| 6709| 2005-08-01 00:00:00| 5686| 2005-09-01 00:00:00| 0| 2005-10-01 00:00:00| 0| 2005-11-01 00:00:00| 0| 2005-12-01 00:00:00| 0| 12 rows
Show/Hide Solution
select
m.t,
count(r.rental_id)
from generate_series('2005-01-01'::timestamp, '2005-12-01'::timestamp, '1 month') as m(t)
left join rental as r
on date_trunc('month', r.rental_date) = m.t
group by m.t;
6.11 Write a query to list the customers who rented out the film with ID 97 and then at some later date rented out the film with ID 841
Table(s) to use: rental, inventory
Expected Output:
customer_id|
-----------|
459|
25|
2 rows
Show/Hide Solution
select r.customer_id
from rental as r
inner join inventory as i
on r.inventory_id = i.inventory_id
inner join rental as r2
on r.customer_id = r2.customer_id
and r2.rental_date > r.rental_date
inner join inventory as i2
on r2.inventory_id = i2.inventory_id
where i.film_id = 97 and i2.film_id = 841;
7. Subqueries
7.1 Write a query to return all the customers who made a rental on the first day of rentals (without hardcoding the date for the first day of rentals in your query)
Table(s) to use: rental, customer
Expected Output:first_name|last_name | ----------|-----------| ANDREW |PURDY | CASSANDRA |WALTERS | CHARLOTTE |HUNTER | DELORES |HANSEN | MANUEL |MURRELL | MINNIE |ROMERO | NELSON |CHRISTENSON| TOMMY |COLLAZO | 8 rows
Show/Hide Solution
select distinct c.first_name, c.last_name from rental as r inner join customer as c using (customer_id) where r.rental_date::date = (select min(rental_date)::date from rental);
7.2 Using a subquery, return the films that don't have any actors. Now write the same query using a left join. Which solution do you think is better? Easier to read?
Table(s) to use: film, film_actor
Expected Output:
film_id|title |
-------|----------------|
257|DRUMLINE CYCLONE|
323|FLIGHT LIES |
803|SLACKER LIAISONS|
3 rows
Show/Hide Solution
select film_id, title from film where film_id not in (select film_id from film_actor);
select film_id, title from film as f left join film_actor as fa using (film_id) where fa.film_id is null;
7.3 You intend to write a humorous email to congratulate some customers on their poor taste in films. To that end, write a query to return the customers who rented out the least popular film (that is, the film least rented out - if there is more than one, pick the one with the lowest film ID)
Table(s) to use: rental, inventory, customer
Expected Output:
customer_id|first_name|last_name|
-----------|----------|---------|
257|MARSHA |DOUGLAS |
142|APRIL |BURNS |
564|BOB |PFEIFFER |
89|JULIA |FLORES |
4 rows
Show/Hide Solution
select c.customer_id, c.first_name, c.last_name
from rental as r1
inner join inventory as i1 using (inventory_id)
inner join customer as c using (customer_id)
where i1.film_id =
(select i2.film_id
from rental as r2
inner join inventory as i2 using (inventory_id)
group by i2.film_id
order by count(*) asc, i2.film_id asc
limit 1);
7.4 Write a query to return the countries in our database that have more than 15 cities
Table(s) to use: country, city
Expected Output:country | ------------------| Brazil | China | India | Japan | Mexico | Philippines | Russian Federation| United States | 8 rows
Show/Hide Solution
select country.country from country where (select count(*) from city where city.country_id = country.country_id) > 15;
7.5 Write a query to return for each customer the store they most commonly rent from
Table(s) to use: customer, rental, inventory
Expected Output:
customer_id|first_name |last_name |Favourite Store|
-----------|-----------|------------|---------------|
1|MARY |SMITH | 1|
2|PATRICIA |JOHNSON | 1|
3|LINDA |WILLIAMS | 1|
4|BARBARA |JONES | 2|
5|ELIZABETH |BROWN | 2|
6|JENNIFER |DAVIS | 1|
7|MARIA |MILLER | 1|
8|SUSAN |WILSON | 2|
...
599 rows
Show/Hide Solution
select
c.customer_id,
c.first_name,
c.last_name,
(select i.store_id
from rental as r
inner join inventory as i using (inventory_id)
where r.customer_id = c.customer_id
group by i.store_id
order by count(*) desc
limit 1) as "Favourite Store"
from customer as c;
7.6 In the customer table, each customer has a store ID which is the store they originally registered at. Write a query to list for each customer whether they have ever rented from a different store than that one they registered at. Return 'Y' if they have, and 'N' if they haven't.
Table(s) to use: rental, inventory, customer
Expected Output:first_name |last_name |HasRentedOtherStore| -----------|------------|-------------------| MARY |SMITH |Y | PATRICIA |JOHNSON |Y | LINDA |WILLIAMS |Y | BARBARA |JONES |Y | ELIZABETH |BROWN |Y | JENNIFER |DAVIS |Y | MARIA |MILLER |Y | SUSAN |WILSON |Y | ... 599 rows
Show/Hide Solution
select c.first_name, c.last_name,
case
when exists
(select *
from rental as r
inner join inventory as i using (inventory_id)
where r.customer_id = c.customer_id
and i.store_id != c.store_id) then 'Y'
else 'N'
end as "HasRentedOtherStore"
from customer as c;
7.7 Write a query to return each customer 4 times
Table(s) to use: customer
Expected Output:first_name |last_name | -----------|------------| MARY |SMITH | MARY |SMITH | MARY |SMITH | MARY |SMITH | PATRICIA |JOHNSON | PATRICIA |JOHNSON | PATRICIA |JOHNSON | PATRICIA |JOHNSON | LINDA |WILLIAMS | LINDA |WILLIAMS | ... 2396 rows
Show/Hide Solution
select c.first_name, c.last_name from customer as c cross join (values (1), (2), (3), (4)) as v(n) order by c.customer_id;
7.8 Write a query to return how many rentals the business gets on average on each day of the week. Order the results to show the days of the week with the highest average number of rentals first (use the round function to round the average so it's a nice whole number). Have a look at the to_char function to obtain the day name given a timestamp. For simplicity, don't worry about days in which there were no rentals.
Table(s) to use: rental
Expected Output:day_name |average| ---------|-------| Sunday | 464| Saturday | 462| Friday | 454| Monday | 449| Wednesday| 446| Thursday | 440| Tuesday | 224| 7 rows
Show/Hide Solution
select
to_char(rent_day, 'Day') as day_name,
round(avg(num_rentals)) as average
from
(select date_trunc('day', rental_date) as rent_day, count(*) as num_rentals
from rental
group by rent_day) as T
group by day_name
order by average desc;
7.9 Write a query to return for each customer the first 'PG' film that they rented (include customers who have never rented a 'PG' film as well)
Table(s) to use: customer, rental, inventory, film
Expected Output:first_name |last_name |title |rental_date | -----------|------------|----------------------|-------------------| MARY |SMITH |TALENTED HOMICIDE |2005-05-28 10:35:23| PATRICIA |JOHNSON |BLACKOUT PRIVATE |2005-06-17 20:54:58| LINDA |WILLIAMS |HORN WORKING |2005-08-18 14:49:55| BARBARA |JONES |BEDAZZLED MARRIED |2005-06-16 08:08:40| ELIZABETH |BROWN |TOOTSIE PILOT |2005-05-29 07:25:16| JENNIFER |DAVIS |MONSTER SPARTACUS |2005-07-11 12:39:01| MARIA |MILLER |LEGEND JEDI |2005-06-16 21:06:00| SUSAN |WILSON |NORTHWEST POLISH |2005-05-30 03:43:54| ... 599 rows
Show/Hide Solution
select c.first_name, c.last_name, d.title, d.rental_date
from customer as c
left join lateral
(select r.customer_id, f.title, r.rental_date
from rental as r
inner join inventory as i using (inventory_id)
inner join film as f using (film_id)
where r.customer_id = c.customer_id
and f.rating = 'PG'
order by r.rental_date
limit 1) as d
on c.customer_id = d.customer_id;
7.10 Write a query to list the customers who rented out the film with title "BRIDE INTRIGUE" and then at some later date rented out the film with title "STAR OPERATION". Use a CTE to simplify your code if possible.
Table(s) to use: rental, inventory, film
Expected Output:
customer_id|
-----------|
459|
25|
2 rows
Show/Hide Solution
with rental_detail as
(
select r.customer_id, r.rental_date, f.title
from rental as r
inner join inventory as i using (inventory_id)
inner join film as f using (film_id)
)
select r1.customer_id
from rental_detail as r1
inner join rental_detail as r2
on r1.customer_id = r2.customer_id
and r2.rental_date > r1.rental_date
and r1.title = 'BRIDE INTRIGUE' and r2.title = 'STAR OPERATION';
7.11 Write a query to calculate the amount of income received each month and compare that against the previous month's income, showing the change.
Table(s) to use: payment
Expected Output:month |income |prev month income|change | -------------------|--------|-----------------|---------| 2007-01-01 00:00:00| 4824.43| | | 2007-02-01 00:00:00| 9631.88| 4824.43| 4807.45| 2007-03-01 00:00:00|23886.56| 9631.88| 14254.68| 2007-04-01 00:00:00|28559.46| 23886.56| 4672.90| 2007-05-01 00:00:00| 514.18| 28559.46|-28045.28| 5 rows
Show/Hide Solution
with monthly_amounts as
(
select
date_trunc('month', payment_date) as month,
sum(amount) as total
from payment
group by month
)
select
curr.month,
curr.total as "income",
prev.total as "prev month income",
curr.total - prev.total as "change"
from monthly_amounts as curr
left join monthly_amounts as prev
on curr.month = prev.month + interval '1 month'
7.12 Write a query to return the customers who rented a film in 2005 but none in 2006
Table(s) to use: rental
Expected Output:
customer_id|
-----------|
1|
2|
3|
4|
6|
7|
8|
10|
...
441 rows
Show/Hide Solution
select distinct customer_id
from rental
where date_part('year', rental_date) = 2005
and customer_id not in
(select customer_id
from rental
where date_part('year', rental_date) = 2006);
7.13 What are the top 3 countries the customers are from. Show both the number of customers from each country and percentage (round the percentage to the nearest whole number)
Table(s) to use: customer, address, city, country
Expected Output:country |num_customers|percent| -------------|-------------|-------| India | 60| 10| China | 53| 9| United States| 36| 6| 3 rows
Show/Hide Solution
select country, count(*) as num_customers, round(100.0 * count(*) / (select count(*) from customer)) as percent from customer as c inner join address using (address_id) inner join city using (city_id) inner join country using (country_id) group by country order by count(*) desc limit 3;
7.14 Write a query to perform a running total of payments received, grouping by month (ie. for each month return the amount of money received that month and also the total amount of money received up to (and including) that month - this is a useful view to have if you wanted to produce a cumulative chart). Hint - Re-use the monthly_amounts CTE from exercise 7.11
Table(s) to use: payment
Expected Output:month |amount |cumamount| -------------------|--------|---------| 2007-01-01 00:00:00| 4824.43| 4824.43| 2007-02-01 00:00:00| 9631.88| 14456.31| 2007-03-01 00:00:00|23886.56| 38342.87| 2007-04-01 00:00:00|28559.46| 66902.33| 2007-05-01 00:00:00| 514.18| 67416.51| 5 rows
Show/Hide Solution
with monthly_amounts as
(
select
date_trunc('month', payment_date) as month,
sum(amount) as amount
from payment
group by month
)
select ma1.month, ma1.amount,
(select sum(ma2.amount)
from monthly_amounts as ma2
where ma2.month <= ma1.month) as cumamount
from monthly_amounts as ma1
order by ma1.month;
7.15 The rental table has 16,044 rows but the maximum rental ID is 16,049. This suggests that some rental IDs have been skipped over. Write a query to find the missing rental IDs. The generate_series function may come in handy
Table(s) to use: rental
Expected Output:id | -----| 321| 2247| 6579| 9426| 15592| 5 rows
Show/Hide Solution
select s.id from generate_series( (select min(rental_id) from rental), (select max(rental_id) from rental)) as s(id) where not exists (select * from rental as r where r.rental_id = s.id);
7.16 In an earlier exercise I asked you to see if you could find a way to return the last 3 payments made in Jan, 2007 but ordered ascending. You've got the tools now to accomplish this - see if you can figure it out!
Table(s) to use: payment
Expected Output:
payment_id|amount|payment_date |
----------|------|-------------------|
17173| 2.99|2007-01-31 20:45:37|
16890| 0.99|2007-01-31 21:06:00|
16813| 0.99|2007-01-31 21:16:11|
3 rows
Show/Hide Solution
select payment_id, amount, payment_date
from
(select payment_id, amount, payment_date
from payment
where payment_date >= '2007-01-01'
and payment_date < '2007-02-01'
order by payment_date desc
limit 3) as p
order by payment_date asc;
8. Window functions
8.1 Write a query to return the 3 most recent rentals for each customer. Earlier you did this with a lateral join - this time do it with window functions
Table(s) to use: rental
Expected Output:
rental_id|customer_id|rental_date |
---------|-----------|-------------------|
15315| 1|2005-08-22 20:03:46|
15298| 1|2005-08-22 19:41:37|
14825| 1|2005-08-22 01:27:57|
15907| 2|2005-08-23 17:39:35|
15145| 2|2005-08-22 13:53:04|
14743| 2|2005-08-21 22:41:56|
15619| 3|2005-08-23 07:10:14|
15038| 3|2005-08-22 09:37:27|
14699| 3|2005-08-21 20:50:48|
15635| 4|2005-08-23 07:43:00|
15147| 4|2005-08-22 13:58:23|
14225| 4|2005-08-21 04:53:37|
...
1797 rows
Show/Hide Solution
select rental_id, customer_id, rental_date
from
(select
rental_id,
customer_id,
rental_date,
row_number() over (partition by customer_id order by rental_date desc) as rn
from rental) as t
where rn <= 3;
8.2 We want to re-do exercise 7.3, where we wrote a query to return the customers who rented out the least popular film (that is, the film least rented out). This time though we want to be able to handle if there is more than one film that is least popular. So if several films are each equally unpopular, return the customers who rented out any of those films.
Table(s) to use: rental, inventory
Expected Output:
customer_id|
-----------|
179|
304|
258|
89|
14|
594|
133|
142|
257|
507|
564|
137|
12 rows
Show/Hide Solution
with rent_counts as
(
select
film_id,
count(*),
rank() over (order by count(*))
from rental
inner join inventory using (inventory_id)
group by film_id
)
select distinct customer_id
from rental as r
inner join inventory as i using (inventory_id)
where i.film_id in
(select film_id
from rent_counts
where rank = 1);
8.3 Write a query to return all the distinct film ratings without using the DISTINCT keyword
Table(s) to use: film
Expected Output:rating| ------| G | PG | PG-13 | R | NC-17 | 5 rows
Show/Hide Solution
select rating
from
(select
rating,
row_number() over (partition by rating) as rn
from film) as t
where rn = 1
and rating is not null;
8.4 Write a query to show for each rental both the rental duration and also the average rental duration from the same customer
Table(s) to use: rental
Expected Output:
customer_id|rental_id|rent_duration |avg |
-----------|---------|---------------|----------------------|
1| 10437|9 days 03:21:00| 4 days 11:18:07.5|
1| 13068| 1 day 04:49:00| 4 days 11:18:07.5|
1| 15315|7 days 05:48:00| 4 days 11:18:07.5|
1| 2308|3 days 18:55:00| 4 days 11:18:07.5|
1| 1185|8 days 01:48:00| 4 days 11:18:07.5|
1| 7273|3 days 19:19:00| 4 days 11:18:07.5|
1| 76|9 days 00:30:00| 4 days 11:18:07.5|
1| 7841|2 days 03:33:00| 4 days 11:18:07.5|
...
16044 rows
Show/Hide Solution
select customer_id, rental_id, return_date - rental_date as rent_duration, avg(return_date - rental_date) over (partition by customer_id) from rental;
8.5 Write a query to calculate a running total of payments received, grouped by month (ie. for each month show the total amount of money received that month and also the total amount of money received up to and including that month)
Table(s) to use: payment
Expected Output:month |amount |running_total| -------------------|--------|-------------| 2007-01-01 00:00:00| 4824.43| 4824.43| 2007-02-01 00:00:00| 9631.88| 14456.31| 2007-03-01 00:00:00|23886.56| 38342.87| 2007-04-01 00:00:00|28559.46| 66902.33| 2007-05-01 00:00:00| 514.18| 67416.51| 5 rows
Show/Hide Solution
with monthly_amounts as
(
select
date_trunc('month', payment_date) as month,
sum(amount) as amount
from payment
group by month
)
select
month,
amount,
sum(amount) over (order by month) as running_total
from monthly_amounts;
8.6 Write a query to return the top 3 earning films in each rating category. Include ties. To calculate the earnings for a film, multiply the rental rate for the film by the number of times it was rented out
Table(s) to use: rental, inventory, film
Expected Output:title |rating|income| ------------------|------|------| CAT CONEHEADS |G |149.70| DOGMA FAMILY |G |149.70| SATURDAY LAMBS |G |139.72| BUCKET BROTHERHOOD|PG |169.66| GOODFELLAS SALUTE |PG |154.69| TITANS JERK |PG |144.71| HARRY IDAHO |PG-13 |149.70| TRIP NEWTON |PG-13 |139.72| ROSES TREASURE |PG-13 |139.72| MASSACRE USUAL |R |149.70| BOOGIE AMELIE |R |144.71| CLOSER BANG |R |139.72| SCALAWAG DUCK |NC-17 |159.68| ZORRO ARK |NC-17 |154.69| APACHE DIVINE |NC-17 |154.69| WIFE TURN |NC-17 |154.69| 16 rows
Show/Hide Solution
with film_incomes as
(
select
f.film_id,
f.title,
f.rating,
f.rental_rate * count(*) as income
from rental as r
inner join inventory as i using (inventory_id)
inner join film as f using (film_id)
group by f.film_id
),
film_rankings as
(
select
film_id,
title,
rating,
income,
rank() over(partition by rating order by income desc)
from film_incomes
where rating is not null
)
select title, rating, income
from film_rankings
where rank <= 3
order by rating, rank;
8.7 The rental table has 16,044 rows but the maximum rental ID is 16,049. This suggests that some rental IDs have been skipped over. Write a query to find the missing rental IDs (you previously did this using the generate_series function. Now do it using only window functions). Note you don't have to have your output formatted the same.
Table(s) to use: rental
Expected Output:
missing_from|missing_to|
------------|----------|
321| 321|
2247| 2247|
6579| 6579|
9426| 9426|
15592| 15592|
5 rows
Show/Hide Solution
with t as
(
select
rental_id as current,
lead(rental_id) over (order by rental_id) as next
from rental
)
select
current + 1 as missing_from,
next - 1 as missing_to
from t
where next - current > 1;
8.8 Calculate for each customer the longest amount of time they've gone between renting a film
Table(s) to use: rental
Expected Output:
customer_id|longest break |
-----------|-----------------|
1| 17 days 14:18:49|
2| 22 days 09:36:26|
3| 18 days 16:07:31|
4| 38 days 16:31:09|
5|175 days 21:39:01|
6| 18 days 11:10:58|
7| 16 days 15:45:13|
8| 17 days 10:58:56|
9|177 days 01:22:04|
10| 17 days 08:40:33|
...
599 rows
Show/Hide Solution
with days_between as
(
select customer_id, rental_date,
lead(rental_date) over (partition by customer_id order by rental_date) - rental_date as diff
from rental
)
select customer_id, max(diff) as "longest break"
from days_between
group by customer_id
order by customer_id;
9. Working with sets
9.1 Write a query to list out all the distinct dates there was some sort of customer interaction (a rental or a payment) and order by output date
Table(s) to use: rental, payment
Expected Output:
interaction_date|
----------------|
2005-05-24|
2005-05-25|
2005-05-26|
2005-05-27|
2005-05-28|
2005-05-29|
2005-05-30|
2005-05-31|
...
81 rows
Show/Hide Solution
( select cast(rental_date as date) as interaction_date from rental ) union ( select cast(payment_date as date) as interaction_date from payment ) order by interaction_date;
9.2 Write a query to find the actors that are also customers (assuming same name = same person)
Table(s) to use: customer, actor
Expected Output:first_name|last_name| ----------|---------| JENNIFER |DAVIS | 1 rows
Show/Hide Solution
( select first_name, last_name from customer ) intersect ( select first_name, last_name from actor );
9.3 Have the actors with IDs 49 (Anne Cronyn), 152 (Ben Harris), and 180 (Jeff Silverstone) ever appeared in any films together? Which ones?
Table(s) to use: film_actor
Expected Output:
film_id|
-------|
729|
1 rows
Show/Hide Solution
( select film_id from film_actor where actor_id = 49 ) intersect ( select film_id from film_actor where actor_id = 152 ) intersect ( select film_id from film_actor where actor_id = 180 );
9.4 The missing rental IDs problem that we've encountered several times now is the perfect place to use EXCEPT. Write a query using the generate_series function and EXCEPT to find missing rental IDs (The rental table has 16,044 rows but the maximum rental ID is 16,049 - some IDs are missing)
Table(s) to use: rental
Expected Output:id | -----| 2247| 9426| 15592| 6579| 321| 5 rows
Show/Hide Solution
(
select t.id
from generate_series(
(select min(rental_id) from rental),
(select max(rental_id) from rental)) as t(id)
)
except
(
select rental_id
from rental
);
9.5 Write a query to list all the customers who have rented out a film on a Saturday but never on a Sunday. Order the customers by first name.
Table(s) to use: rental, customer
Expected Output:first_name|last_name| ----------|---------| CHRISTIAN |JUNG | CLIFTON |MALCOLM | EVA |RAMOS | FELIX |GAFFNEY | LEON |BOSTIC | MARION |OCAMPO | ROBIN |HAYES | TIFFANY |JORDAN | 8 rows
Show/Hide Solution
(
select first_name, last_name
from rental
inner join customer using (customer_id)
where date_part('isodow', rental_date) = 6
)
except
(
select first_name, last_name
from rental
inner join customer using (customer_id)
where date_part('isodow', rental_date) = 7
)
order by first_name;
9.6 Write a query to list out all the distinct dates there was some sort of customer interaction (a rental or a payment) and order by output date. Include only one row in the output for each type of interaction
Table(s) to use: rental, payment
Expected Output:
interaction_date|type |
----------------|-------|
2005-05-24|rental |
2005-05-25|rental |
2005-05-26|rental |
2005-05-27|rental |
2005-05-28|rental |
2005-05-29|rental |
2005-05-30|rental |
2005-05-31|rental |
2005-06-14|rental |
...
2007-01-24|payment|
2007-01-25|payment|
2007-01-26|payment|
2007-01-27|payment|
2007-01-28|payment|
2007-01-29|payment|
2007-01-30|payment|
...
81 rows
Show/Hide Solution
( select cast(rental_date as date) as interaction_date, 'rental' as type from rental ) union ( select cast(payment_date as date) as interaction_date, 'payment' as type from payment ) order by interaction_date;
9.7 Write a query to return the countries in which there are both customers and staff. Use a CTE to help simplify your code.
Table(s) to use: staff, customer, address, city, country
Expected Output:country| -------| Canada | 1 rows
Show/Hide Solution
with address_country as
(
select address_id, country
from address
inner join city using (city_id)
inner join country using (country_id)
)
(
select country
from staff
inner join address_country using (address_id)
)
intersect
(
select country
from customer
inner join address_country using (address_id)
);
9.8 Imagine you had two queries - let's call them A and B. Can you figure out how you would use set operators to return the rows in either A or B, but not both.
Show/Hide Solution
10. Creating tables and constraints
10.1 In this and the following exercises in this chapter, we're going to be doing some lightweight database modelling work for a fictional beach equipment rental business. Your answers may deviate a little from mine as we go, and that's fine - database design is a quite subjective topic. To kick things off, we'll keep working with our existing database but we want to create all our tables within a schema called 'beach'. Write a SQL statement to create the 'beach' schema.
Show/Hide Solution
create schema beach;
10.2 Create a table to store customers. For each customer we want to capture their first name, last name, email address, phone number, and the date the account was created. Don't worry about primary keys and constraints for now - just focus on the create statement and choosing what you think are appropriate data types for the listed attributes.
Table(s) to use: beach.customers
Show/Hide Solution
create table beach.customers ( email text, first_name text, last_name text, phone text, create_date date );
10.3 Create a table to store details about the equipment to be rented out. For each item, we want to store the type ('surf board', 'kayak', etc.), a general ad-hoc description of the item (color, brand, condition, etc), and replacement cost so we know what to charge customers if they lose the item.
Table(s) to use: beach.equipment
Show/Hide Solution
create table beach.equipment ( item_type text, description text, replacement_cost numeric(7, 2) );
10.4 After running the business for a while, we notice that customers sometimes lose equipment. Write a SQL statement to alter the equipment table (assume it already has data in it we don't want to lose) to add a column to track whether the item is missing.
Table(s) to use: beach.equipment
Show/Hide Solution
alter table beach.equipment add missing boolean;
10.5 Add a surrogate primary key for the customers table using the GENERATED AS IDENTITY syntax (we assume not all customers will provide an email address or phone number ruling them out as potential natural keys). Note you may drop the schema/table and re-create it from scratch.
Table(s) to use: beach.customers
Show/Hide Solution
create table beach.customers ( customer_id bigint generated always as identity primary key, email text, first_name text, last_name text, phone text, create_date date );
10.6 Add a surrogate primary key for the equipment table using one of the serial types. Also add in to the table definition the 'missing' column from exercise 10.4. Note you may drop the schema/table and re-create it from scratch.
Table(s) to use: beach.equipment
Show/Hide Solution
create table beach.equipment ( equipment_id bigserial primary key, item_type text, description text, replacement_cost numeric(7, 2), missing boolean );
10.7 Create a new table to store information about each rental with an appropriate primary key. For each rental, store the customer, the item that was rented, the rental date, and the return date.
Table(s) to use: beach.rentals
Show/Hide Solution
create table beach.rentals ( customer_id bigint, equipment_id bigint, rental_date date, return_date date, primary key (customer_id, equipment_id, rental_date) );
10.8 Add appropriate foreign keys to the rentals table. Setup the foreign keys such that if the referenced customer or equipment is deleted, the related entries in the rentals table will also be deleted. Note you may drop the schema/table and re-create it from scratch.
Table(s) to use: beach.rentals
Show/Hide Solution
create table beach.rentals ( customer_id bigint references beach.customers (customer_id) on delete cascade, equipment_id bigint references beach.equipment (equipment_id) on delete cascade, rental_date date, return_date date, primary key (customer_id, equipment_id, rental_date) );
10.9 Add appropriate check, unique, not null, and default constraints to the customers table to capture the following requirements: a) A customer must provide both a first name and last name b) A customer must provide at least one contact detail - a phone number or email address c) The create date should be the date the new customer record is inserted in the table d) No two customers should have the same email address or phone number
Table(s) to use: beach.customers
Show/Hide Solution
create table beach.customers ( customer_id bigint generated always as identity primary key, email text unique, first_name text not null, last_name text not null, phone text unique, create_date date not null default current_date, check (email is not null or phone is not null) );
10.10 Add appropriate check, unique, not null, and default constraints to the equipment table to capture the following requirements: a) A newly added item should not be missing! b) Each item must have a type c) The replacement cost can be NULL. But if provided, it must be a positive number
Table(s) to use: beach.equipment
Show/Hide Solution
create table beach.equipment ( equipment_id bigserial primary key, item_type text not null, description text, replacement_cost numeric(7, 2) check (replacement_cost >= 0), missing boolean not null default false );
10.11 Search the official PostgreSQL documentation for information about GENERATED { ALWAYS | BY DEFAULT } and explain in your own words the difference between the two options.
Show/Hide Solution
10.12 Imagine you wanted to restrict the equipment types to just a couple of options like 'Surfboard', 'Single Kayak', 'Double Kayak', etc. To do this, you actually have more options at your disposal than you might think. You could add a check constraint to the column. You could create an ENUM. You could create a whole new data type called a DOMAIN. Or you could use a new table in your database to store the available types. Have a read of this article and justify which approach you would take in this case.
Show/Hide Solution
create table equipment_types ( type text primary key );
10.13 A customer comes in and asks whether there are any Single Kayak's available for rent. Write a query to return a count of how many Single Kayak's are available for rent.
Table(s) to use: beach.equipment, beach.rentals
Show/Hide Solution
select count(*)
from beach.equipment
where item_type = 'Single Kayak'
and missing = false
and equipment_id not in
(select equipment_id
from beach.rentals
where return_date is null);
10.14 Right now a rental consists of a customer renting a single item. If it were more common though for a customer to rent multiple items at a time and you wanted to capture these all under a single 'rental', how would you modify the existing tables (or create new tables) to model this.
Show/Hide Solution
create table beach.rentals ( rental_id bigserial primary key, customer_id bigint references beach.customers (customer_id), rental_date date, return_date date ); create table beach.rental_details ( rental_id bigint references beach.rentals (rental_id), equipment_id bigint references beach.equipment (equipment_id), primary key (rental_id, equipment_id) );
11. Inserting, updating, and deleting data
11.1 In the last video, we imported the data for the users and notes tables from CSV files. Now also import the note tags from CSV.
Table(s) to use: playground.note_tags
Show/Hide Solution
copy playground.note_tags from '...note_tags.csv' with (format csv, header true);
11.2 Export in CSV format (with headings) the number of notes created by each user. Order the users in descending order based on how many notes they've created.
Table(s) to use: playground.users, playground.notes
Expected Output:name,note_count David Hilbert,4 Joseph Fourier,3 Bernhard Riemann,2 Leonhard Euler,1 Blaise Pascal,1
Show/Hide Solution
copy
(
select
u.first_name || ' ' || u.last_name as name,
count(*) as note_count
from playground.notes as n
inner join playground.users as u
on n.user_email = u.email
group by u.email, u.first_name, u.last_name
order by note_count desc
)
to '...note_counts.csv'
with (format csv, header true);
11.3 Output values (any values of your choosing) of the following types to a plain text file: a) text b) int c) numeric d) real e) boolean f) date g) timestamptz h) interval
Expected Output:test 33 33.3 33.2999992 t 2019-05-12 2019-05-12 15:25:11.644876+10 1 day
Show/Hide Solution
copy
(
select
'test'::text,
33::int,
33.3::numeric,
33.3::real,
true::boolean,
current_date::date,
current_timestamp::timestamptz,
'1 day'::interval
)
to '...output_types.txt';
11.4 Write an insert statement to insert a new customer in to the customer table with any details of your choosing. Use the returning clause to return the inserted row. For this and all upcoming exercises, remember to do this inside a transaction block and rollback the change afterwards!
Table(s) to use: customer
Expected Output:
customer_id|store_id|first_name|last_name|email|address_id|activebool|create_date|last_update |active|
-----------|--------|----------|---------|-----|----------|----------|-----------|-------------------|------|
600| 1|John |Henry | | 1|true | 2019-05-14|2019-05-14 15:21:57| 1|
Show/Hide Solution
insert into customer(store_id, first_name, last_name, address_id, active) values (1, 'John', 'Henry', 1, 1) returning *;
11.5 Create a new table called rental_stats with two columns, date and num_rentals, to hold the data for the number of rentals made each day. And write an insert statement to populate the table with correct data as calculated from the rental table
Table(s) to use: rental, rental_stats
Show/Hide Solution
create table rental_stats (
date date,
num_rentals int
);
insert into rental_stats
select
rental_date::date as rental_day,
count(*)
from rental
group by rental_day
order by rental_day;
create table rental_stats as
select
rental_date::date as rental_day,
count(*)
from rental
group by rental_day
order by rental_day;
11.6 All customers should have an email address of the form [first_name].[last_name]@sakilacustomer.org (all in lower case). Write an update statement so that all customers have an email address in this format.
Table(s) to use: customer
Show/Hide Solution
update customer set email = lower(first_name || '.' || last_name || '@sakilacustomer.org');
11.7 Write an update statement to update the rental rate of the 20 most rented films by 10%
Table(s) to use: rental, inventory, film
Show/Hide Solution
update film
set rental_rate = rental_rate * 1.1
where film_id in
(
select
i.film_id
from rental as r
inner join inventory as i using (inventory_id)
group by i.film_id
order by count(*) desc
limit 20
);
11.8 Write a script to add a new column to the films table to hold the length of each film in hours (have a look at some of the examples for the ALTER TABLE command) and then populate this new column with the correct values
Table(s) to use: film
Show/Hide Solution
alter table film add column length_hrs numeric(2, 1); update film set length_hrs = length / 60.0 returning *;
11.9 Delete all the payments where the payment amount was zero, returning the deleted rows
Table(s) to use: payment
Expected Output:
payment_id|customer_id|staff_id|rental_id|amount|payment_date |
----------|-----------|--------|---------|------|-------------------|
31918| 267| 2| 13713| 0.00|2007-05-14 13:44:29|
31920| 269| 2| 12610| 0.00|2007-05-14 13:44:29|
31925| 284| 2| 12959| 0.00|2007-05-14 13:44:29|
31942| 354| 1| 11782| 0.00|2007-05-14 13:44:29|
31946| 361| 1| 14769| 0.00|2007-05-14 13:44:29|
31966| 448| 1| 13577| 0.00|2007-05-14 13:44:29|
31970| 457| 2| 14516| 0.00|2007-05-14 13:44:29|
31983| 516| 1| 12915| 0.00|2007-05-14 13:44:29|
...
24 rows
Show/Hide Solution
delete from payment where amount = 0 returning *;
11.10 Delete all the unused languages from the language table
Table(s) to use: language
Show/Hide Solution
delete from language where language_id not in (select distinct language_id from film);
11.11 Write a single update statement to update the activebool column for customers to be true if they made a rental in 2006 or later, and false otherwise.
Table(s) to use: customer, rental
Show/Hide Solution
update customer
set activebool =
case
when exists
(select *
from rental
where rental.customer_id = customer.customer_id
and rental_date >= '2006-01-01')
then true
else false
end;
11.12 Create a new table, with appropriate primary keys and foreign keys, to hold the amount of inventory of each film in each store (store_id, film_id, stock_count). In this table we want to store the stock level for every film in every store - including films that aren't in stock. Write an "upsert" statement to populate the table with the correct values. By "upsert", I mean insert a SQL statement that will either insert a new row in the table (ie. a new film, store, stock count) or update the stock count if the film/store attempting to be inserted already exists in the table). Research PostgreSQL's INSERT ON CONFLICT and look at the examples for some guidance on how to do this.
Table(s) to use: film, store, inventory, inventory_stats (new)
Show/Hide Solution
create table inventory_stats
(
store_id smallint references store (store_id),
film_id smallint references film (film_id),
stock_count int not null,
primary key (store_id, film_id)
);
insert into inventory_stats(store_id, film_id, stock_count)
select s.store_id, f.film_id, count(i.inventory_id)
from film as f
cross join store as s
left join inventory as i
on f.film_id = i.film_id
and s.store_id = i.store_id
group by f.film_id, s.store_id
on conflict (store_id, film_id)
do update set
stock_count = excluded.stock_count;
11.13 Write a single statement to delete the first rental made by each customer and to avoid any foreign key violations you'll also have to delete any associated payments in that same statement. You might need to do some research online to figure this one out. As a hint, you can use Common Table Expressions (CTEs) with delete statements and delete statements themselves can return results with the RETURNING clause!
Table(s) to use: rental, payment
Show/Hide Solution
with deleted_rentals as
(
delete from rental
where rental_id in
(select distinct on (customer_id) rental_id
from rental
order by customer_id, rental_date)
returning rental_id
)
delete from payment
where rental_id in
(select rental_id
from deleted_rentals);
11.14 In the films table the rating column is of type mpaa_rating, which is an ENUM. You've read online about the downsides of ENUMs and now want to convert your table design to instead store the different mpaa rating types in a reference table with the type as the primary key. Write a script to create the new table, populate it with data, convert the film table, and then drop the mpaa_rating type so it won't be used ever again. You're going to need to Google a few ideas and look up some documentation to get through this one - good luck!
Table(s) to use: film, mpaa_ratings (new)
Show/Hide Solution
create table mpaa_ratings ( rating text primary key ); insert into mpaa_ratings select unnest(enum_range(null::mpaa_rating)); alter table film alter column rating drop default, alter column rating type text, alter column rating set default 'G', add foreign key (rating) references mpaa_ratings(rating); drop type mpaa_rating;
12. Views and functions
12.1 We often need to get basic film information for a rental and so regularly find ourselves writing queries to join from the rental table on to the inventory and film tables. Write a view called vw_rental_film to make this more convenient, returning for each rental ID the film's title, length, and rating. Then write a query to return all the rows from this view to check it's working as expected.
Table(s) to use: rental, inventory, film
Expected Output:
rental_id|title |length|rating|
---------|---------------------------|------|------|
1|BLANKET BEVERLY | 148|G |
2|FREAKY POCUS | 126|R |
3|GRADUATE LORD | 156|G |
4|LOVE SUICIDES | 181|R |
5|IDOLS SNATCHERS | 84|NC-17 |
6|MYSTIC TRUMAN | 92|NC-17 |
7|SWARM GOLD | 123|PG-13 |
8|LAWLESS VISION | 181|G |
...
16044 rows
Show/Hide Solution
create view vw_rental_film as
select
r.rental_id,
f.title,
f.length,
f.rating
from rental as r
inner join inventory as i using (inventory_id)
inner join film as f using (film_id)
order by r.rental_id;
12.2 Use the vw_rental_film view you just created to return, for each customer, the number of 'R' films they've rented out. Include customers who haven't rented any R films also. Note - this is trickier than it first appears...be very careful and double-check your results!
Table(s) to use: customer, rental
Expected Output:You should have two customers in your output that did not rent any 'R' films - The customers with IDs 238 and 437.
customer_id|count|
-----------|-----|
1| 5|
2| 4|
3| 12|
4| 4|
5| 9|
6| 6|
7| 7|
8| 3|
...
238| 0|
...
437| 0|
...
599 rows
Show/Hide Solution
select
c.customer_id,
count(r.rental_id)
from customer as c
left join (rental as r
inner join vw_rental_film as rf
on r.rental_id = rf.rental_id
and rf.rating = 'R')
using (customer_id)
group by c.customer_id
order by c.customer_id;
12.3 Create a view called vw_monthly_totals that returns the amount of income received each month (you've done this a couple of times in this course now - time to finally save this in a view to avoid the repetition!)
Table(s) to use: payment
Expected Output:month |total | -------------------|--------| 2007-01-01 00:00:00| 4824.43| 2007-02-01 00:00:00| 9631.88| 2007-03-01 00:00:00|23886.56| 2007-04-01 00:00:00|28559.46| 2007-05-01 00:00:00| 514.18| 5 rows
Show/Hide Solution
create view vw_monthly_totals as
select
date_trunc('month', payment_date) as month,
sum(amount) as total
from payment
group by month
order by month;
12.4 Using the new vw_monthly_totals view and window functions (remember how those work? If not, here's your chance to refresh!), write a query that returns the amount of income received each month and compares it against the previous month's income, showing the change.
Table(s) to use: payment
Expected Output:month |income |prev month income|change | -------------------|--------|-----------------|---------| 2007-01-01 00:00:00| 4824.43| | | 2007-02-01 00:00:00| 9631.88| 4824.43| 4807.45| 2007-03-01 00:00:00|23886.56| 9631.88| 14254.68| 2007-04-01 00:00:00|28559.46| 23886.56| 4672.90| 2007-05-01 00:00:00| 514.18| 28559.46|-28045.28| 5 rows
Show/Hide Solution
select month, total as income, lag(total) over (order by month) as "prev month income", total - lag(total) over (order by month) as "change" from vw_monthly_totals;
12.5 Continuing on from Exercise 1, you now have a view called vw_rental_film. You create a new materialized view called mvw_rental_film defined as below. Imagine a period of time has now passed and the materialized view's cache is out of date. Write a query which will output the difference between the original view and the materialized view (essentially this boils down to writing a query to show the difference between two sets of results). Within a test transaction block you can roll back, make some insertions and deletions to test your query works as expected.
create materialized view mvw_rental_film as select * from vw_rental_film;
Show/Hide Solution
( select * from vw_rental_film except select * from mvw_rental_film ) union all ( select * from mvw_rental_film except select * from vw_rental_film );
12.6 Write a function using SQL that takes as an argument a customer_id and returns the number of unreturned rentals for that customer
Table(s) to use: rental
Show/Hide Solution
create or replace function unreturned_rentals
(
p_customer_id int
)
returns int
language sql
as $$
select cast(count(*) as int)
from rental
where customer_id = p_customer_id
and return_date is null;
$$;
12.7 Using the function you just created, write a query that outputs the number of unreturned rentals for each customer
Table(s) to use: customer
Expected Output:
customer_id|unreturned_rentals|
-----------|------------------|
1| 0|
2| 0|
3| 0|
4| 0|
5| 1|
6| 0|
7| 0|
8| 0|
9| 1|
10| 0|
...
599 rows
Show/Hide Solution
select customer_id, unreturned_rentals(customer_id) from customer;
12.8 Write a function using SQL to generate a random int between HIGH and LOW inclusive (two input int arguments)
Show/Hide Solution
create or replace function rand ( p_low int, p_high int ) returns int language sql as $$ select floor(random() * (p_high - p_low + 1))::int + p_low; $$;
12.9 Write a function using PL/pgSQL that decides whether a customer should get a discount when renting a film. Discounts are provided if a customer has no unreturned rentals (feel free to use the function you wrote in Exercise 12.6) and has also never rented the film before. The function should take a customer_id and film_id as arguments and return true if the customer should receive a discount, or false otherwise.
Show/Hide Solution
create or replace function apply_discount
(
p_customer_id int,
p_film_id int
)
returns boolean
language plpgsql
as $$
declare
num_outstanding int;
num_rented int;
begin
num_outstanding := unreturned_rentals(p_customer_id);
select count(*) into num_rented
from rental as r
inner join inventory as i using (inventory_id)
inner join film as f using (film_id)
where customer_id = p_customer_id
and film_id = p_film_id;
return (num_outstanding = 0 and num_rented = 0);
end
$$;
12.10 For this exercise we're going to be writing a function using PL/pgSQL to implement the classic FizzBuzz problem. Write a function that returns in a table each number from 1 to n (where n is an argument you pass in to the function), and in the second column if the number is a multiple of 3, print "Fizz". If the number is a multiple of 5, print "Buzz". But if the number is a multiple of both 3 and 5, print "FizzBuzz". Otherwise, just print the number. You may want to use RETURN NEXT to build up your table one row at a time - read more about this in the official documentation.
Expected Output:Calling the function with n = 100 should returnn |fb | ---|--------| 1|1 | 2|2 | 3|Fizz | 4|4 | 5|Buzz | 6|Fizz | 7|7 | 8|8 | 9|Fizz | 10|Buzz | 11|11 | 12|Fizz | 13|13 | 14|14 | 15|FizzBuzz| 16|16 | ... 100 rows
Show/Hide Solution
create or replace function fizzbuzz (p_n int)
returns table (n int, fb text)
language plpgsql
as $$
begin
for cntr in 1..p_n loop
n := cntr;
if (cntr % 3 = 0 and cntr % 5 = 0) then
fb := 'FizzBuzz';
elsif cntr % 3 = 0 then
fb := 'Fizz';
elsif cntr % 5 = 0 then
fb := 'Buzz';
else
fb := cntr;
end if;
return next;
end loop;
end
$$;