3. Query fundamentals

3.1 Write a query to list all the film titles

Table(s) to use: film

Expected Output:
title                      |
---------------------------|
ACADEMY DINOSAUR           |
ACE GOLDFINGER             |
ADAPTATION HOLES           |
AFFAIR PREJUDICE           |
AFRICAN EGG                |
AGENT TRUMAN               |
AIRPLANE SIERRA            |
AIRPORT POLLOCK            |
...

1000 rows

Show/Hide Solution

A SELECT statement will work here.
select title
from film;

3.2 Write a query to return the actor's first names and last names only (with the column headings "First Name" and "Last Name")

Table(s) to use: actor

Expected Output:
First Name |Last Name   |
-----------|------------|
PENELOPE   |GUINESS     |
NICK       |WAHLBERG    |
ED         |CHASE       |
JENNIFER   |DAVIS       |
JOHNNY     |LOLLOBRIGIDA|
BETTE      |NICHOLSON   |
GRACE      |MOSTEL      |
MATTHEW    |JOHANSSON   |
JOE        |SWANK       |
CHRISTIAN  |GABLE       |
ZERO       |CAGE        |
...

200 rows

Show/Hide Solution

Use column name aliasing to rename the columns. You can alias more than one column in the SELECT clause.
select first_name as "First Name", last_name as "Last Name"
from actor;

3.3 How many rows are in the inventory table?

Table(s) to use: inventory

Show/Hide Solution

4,581. After doing a select, you can find out how many rows were returned in the output window. In a later video, you'll learn how to write a query to return this count to you directly using the COUNT(*) function (if you're feeling adventurous, see if you can figure it out now! Try using SELECT COUNT(*) instead of SELECT *)
select *
from inventory;

3.4 Write a query that returns all the columns from the actor table without using the * wildcard in the SELECT clause

Table(s) to use: actor

Expected Output:
actor_id|first_name |last_name   |last_update        |
--------|-----------|------------|-------------------|
       1|PENELOPE   |GUINESS     |2006-02-15 09:34:33|
       2|NICK       |WAHLBERG    |2006-02-15 09:34:33|
       3|ED         |CHASE       |2006-02-15 09:34:33|
       4|JENNIFER   |DAVIS       |2006-02-15 09:34:33|
       5|JOHNNY     |LOLLOBRIGIDA|2006-02-15 09:34:33|
       6|BETTE      |NICHOLSON   |2006-02-15 09:34:33|
       7|GRACE      |MOSTEL      |2006-02-15 09:34:33|
       8|MATTHEW    |JOHANSSON   |2006-02-15 09:34:33|
       9|JOE        |SWANK       |2006-02-15 09:34:33|
      10|CHRISTIAN  |GABLE       |2006-02-15 09:34:33|
...

200 rows

Show/Hide Solution

Looking at the ERD (Entity Relationship Diagram) we can find out all the columns in the actor table and list them explicitly in the SELECT statement.
select actor_id, first_name, last_name, last_update
from actor;

3.5 Write a query to obtain the length of each customer's first name (*remember to look for string functions in the documentation that can help*)

Table(s) to use: customer

Expected Output:
first_name |length|
-----------|------|
MARY       |     4|
PATRICIA   |     8|
LINDA      |     5|
BARBARA    |     7|
ELIZABETH  |     9|
JENNIFER   |     8|
MARIA      |     5|
SUSAN      |     5|
...

599 rows

Show/Hide Solution

You can use the length(string) function to obtain the length of text.
select first_name, length(first_name)
from customer;

3.6 Write a query to return the initials for each customer

Table(s) to use: customer

Expected Output:
first_name |last_name   |initial|
-----------|------------|-------|
MARY       |SMITH       |MS     |
PATRICIA   |JOHNSON     |PJ     |
LINDA      |WILLIAMS    |LW     |
BARBARA    |JONES       |BJ     |
ELIZABETH  |BROWN       |EB     |
JENNIFER   |DAVIS       |JD     |
MARIA      |MILLER      |MM     |
SUSAN      |WILSON      |SW     |
...

599 rows

Show/Hide Solution

You can use the left(text, n) function to obtain the first character from the first name and last name of each customer, and then join them with ||.
select
  first_name,
  last_name,
  left(first_name, 1) || left(last_name, 1) as "initial"
from customer;

3.7 Each film has a rental_rate, which is how much money it costs for a customer to rent out the film. Each film also has a replacement_cost, which is how much money the film costs to replace. Write a query to figure out how many times each film must be rented out to cover its replacement cost.

Table(s) to use: film

Expected Output:
title                      |rental_rate|replacement_cost|# rentals to break-even|
---------------------------|-----------|----------------|-----------------------|
ACADEMY DINOSAUR           |       0.99|           20.99|                     22|
ACE GOLDFINGER             |       4.99|           12.99|                      3|
ADAPTATION HOLES           |       2.99|           18.99|                      7|
AFFAIR PREJUDICE           |       2.99|           26.99|                     10|
AFRICAN EGG                |       2.99|           22.99|                      8|
AGENT TRUMAN               |       2.99|           17.99|                      7|
AIRPLANE SIERRA            |       4.99|           28.99|                      6|
AIRPORT POLLOCK            |       4.99|           15.99|                      4|
...

1000 rows

Show/Hide Solution

Divide replacement_cost by rental_rate to obtain the number of times each film must be rented out. This will however often be a fractional number and a film can't be rented out a fractional number of times. To round this up to the nearest integer you can use the ceil(numeric) function. Click here for a full list of math functions.
select
  title,
  rental_rate,
  replacement_cost,
  ceil(replacement_cost / rental_rate) as "# rentals to break-even"
from film;

3.8 Write a query to list all the films with a 'G' rating

Table(s) to use: film

Expected Output:
title                    |rating|
-------------------------|------|
ACE GOLDFINGER           |G     |
AFFAIR PREJUDICE         |G     |
AFRICAN EGG              |G     |
ALAMO VIDEOTAPE          |G     |
AMISTAD MIDSUMMER        |G     |
ANGELS LIFE              |G     |
ANNIE IDENTITY           |G     |
ARMAGEDDON LOST          |G     |
...

178 rows

Show/Hide Solution

select title, rating
from film
where rating = 'G';

3.9 List all the films longer than 2 hours (note each film has a length in minutes)

Table(s) to use: film

Expected Output:
title                      |length|
---------------------------|------|
AFRICAN EGG                |   130|
AGENT TRUMAN               |   169|
ALAMO VIDEOTAPE            |   126|
ALASKA PHANTOM             |   136|
ALI FOREVER                |   150|
ALLEY EVOLUTION            |   180|
AMERICAN CIRCUS            |   129|
ANALYZE HOOSIERS           |   181|
...

456 rows

Show/Hide Solution

Use a WHERE clause with length > 120 (120 minutes is equivalent to 2 hrs). Alternatively, you could have used where (length/60.0) > 2 but you would have to be careful with integer division (if you're not familiar with integer division, don't worry. We'll cover it in an upcoming video).
select title, length
from film
where length > 120;

3.10 Write a query to list all the rentals made before June, 2005

Table(s) to use: rental

Expected Output:
rental_id|rental_date        |
---------|-------------------|
        2|2005-05-24 22:54:33|
        3|2005-05-24 23:03:39|
        4|2005-05-24 23:04:41|
        5|2005-05-24 23:05:21|
        6|2005-05-24 23:08:07|
        7|2005-05-24 23:11:53|
        8|2005-05-24 23:31:46|
        9|2005-05-25 00:00:40|
...

1156 rows

Show/Hide Solution

You can use the first day of June 2005 ('2005-06-01') as the date to compare against.
select rental_id, rental_date
from rental
where rental_date < '2005-06-01';

3.11 In Exercise 3.7, you wrote a query to figure out how many times each film must be rented out to cover its replacement cost. Now write a query to return only those films that must be rented out more than 30 times to cover their replacement cost.

Table(s) to use: film

Expected Output:
title                 |rental_rate|replacement_cost|# rentals to break-even|
----------------------|-----------|----------------|-----------------------|
ARABIA DOGMA          |       0.99|           29.99|                     31|
BALLROOM MOCKINGBIRD  |       0.99|           29.99|                     31|
BONNIE HOLOCAUST      |       0.99|           29.99|                     31|
CLOCKWORK PARADISE    |       0.99|           29.99|                     31|
CLYDE THEORY          |       0.99|           29.99|                     31|
CRUELTY UNFORGIVEN    |       0.99|           29.99|                     31|
EARTH VISION          |       0.99|           29.99|                     31|
EVERYONE CRAFT        |       0.99|           29.99|                     31|
...

22 rows

Show/Hide Solution

We have to repeat the expression to calculate the number of times the film must be rented out in both the select and the where clause. In a later video, we'll learn ways that we can avoid this repetition.
select
  title,
  rental_rate,
  replacement_cost,
  ceil(replacement_cost / rental_rate) as "# rentals to break-even"
from film
where ceil(replacement_cost / rental_rate) > 30;

3.12 Write a query to show all rentals made by the customer with ID 388 in 2005

Table(s) to use: rental

Expected Output:
rental_id|rental_date        |
---------|-------------------|
       21|2005-05-25 01:59:46|
      411|2005-05-27 14:14:14|
     1276|2005-06-15 08:00:13|
     2145|2005-06-17 22:10:36|
     2537|2005-06-19 01:52:21|
     2692|2005-06-19 13:08:19|
     3159|2005-06-20 21:11:50|
     4947|2005-07-08 22:49:37|
...

29 rows

Show/Hide Solution

select rental_id, rental_date
from rental
where rental_date >= '2005-01-01'
  and rental_date < '2006-01-01'
  and customer_id = 388;

3.13 We’re trying to list all films with a length of an hour or less. Show two different ways to fix our query below that isn't working (one using the NOT keyword, and one without)

select title, rental_duration, length
from film
where length not > 60;

Table(s) to use: film

Expected Output:
title                |rental_duration|length|
---------------------|---------------|------|
ACE GOLDFINGER       |              3|    48|
ADAPTATION HOLES     |              7|    50|
AIRPORT POLLOCK      |              6|    54|
ALIEN CENTER         |              5|    46|
ALTER VICTORY        |              6|    57|
APOCALYPSE FLAMINGOS |              6|     0|
BENEATH RUSH         |              6|    53|
BLUES INSTINCT       |              5|    50|
...

105 rows

Show/Hide Solution

A solution using the NOT keyword (the original query had the NOT in the wrong place)
select title, rental_duration, length
from film
where not length > 60;
A solution not using the NOT keyword.
select title, rental_duration, length
from film
where length <= 60;

3.14 Explain what each of the two queries below are doing and why they generate different results. Which one is probably a mistake and why?

select title, rating
from film
where rating != 'G'
  and rating != 'PG';
select title, rating
from film
where rating != 'G'
  or rating != 'PG';

Table(s) to use: film

Show/Hide Solution

The first query will return all films where the rating is neither 'G' nor 'PG'. Stated another way, if a film is rated 'G' it won't be in the output. If a film is rated 'PG' it also won't be in the output. All other films will be in the output. The second query almost does nothing. It returns almost all the films - including those with a 'G' rating and also those with a 'PG' rating. (I say almost for reasons you'll learn about in the next video). Think carefully about how both AND and OR work to understand this behaviour.

3.15 Write a single query to show all rentals where the return date is greater than the rental date, or the return date is equal to the rental date, or the return date is less than the rental date. How many rows are returned? Why doesn't this match the number of rows in the table overall?

Table(s) to use: rental

Expected Output:
rental_id|rental_date        |return_date        |
---------|-------------------|-------------------|
        2|2005-05-24 22:54:33|2005-05-28 19:40:33|
        3|2005-05-24 23:03:39|2005-06-01 22:12:39|
        4|2005-05-24 23:04:41|2005-06-03 01:43:41|
        5|2005-05-24 23:05:21|2005-06-02 04:33:21|
        6|2005-05-24 23:08:07|2005-05-27 01:32:07|
        7|2005-05-24 23:11:53|2005-05-29 20:34:53|
        8|2005-05-24 23:31:46|2005-05-27 23:33:46|
        9|2005-05-25 00:00:40|2005-05-28 00:22:40|
...

15861 rows

Show/Hide Solution

In the next video, we'll explore why this only returns 15,861 rows and not 16,044 - the total number of rows in the table.
select rental_id, rental_date, return_date
from rental
where return_date > rental_date
  or return_date = rental_date
  or return_date < rental_date;

3.16 Write a query to list the rentals that haven't been returned

Table(s) to use: rental

Expected Output:
rental_id|return_date|
---------|-----------|
    11496|           |
    11541|           |
    12101|           |
    11563|           |
    11577|           |
    11593|           |
    11611|           |
    11646|           |
...

183 rows

Show/Hide Solution

A rental that hasn't been returned has a NULL return_date.
select rental_id, return_date
from rental
where return_date is null;

3.17 Write a query to list the films that have a rating that is not 'G' or 'PG'

Table(s) to use: film

Expected Output:
title                      |rating|
---------------------------|------|
ADAPTATION HOLES           |NC-17 |
AIRPLANE SIERRA            |PG-13 |
AIRPORT POLLOCK            |R     |
ALABAMA DEVIL              |PG-13 |
ALADDIN CALENDAR           |NC-17 |
DATE SPEED                 |R     |
ALICE FANTASIA             |NC-17 |
ALIEN CENTER               |NC-17 |
...

629 rows

Show/Hide Solution

For a film to appear in the output, we want the rating to not be 'PG'. AND we want the rating not to be 'G'. But we will want to include it if the rating is NULL. Be sure to study the use of AND and OR here carefully.
select title, rating
from film
where rating != 'PG'
  and rating != 'G'
  or rating is null;

3.18 Write a query to return the films with a rating of 'PG', 'G', or 'PG-13'

Table(s) to use: film

Expected Output:
title                      |rating|
---------------------------|------|
ACADEMY DINOSAUR           |PG    |
ACE GOLDFINGER             |G     |
AFFAIR PREJUDICE           |G     |
AFRICAN EGG                |G     |
AGENT TRUMAN               |PG    |
AIRPLANE SIERRA            |PG-13 |
ALABAMA DEVIL              |PG-13 |
ALAMO VIDEOTAPE            |G     |
...

593 rows

Show/Hide Solution

We can use the IN keyword to simplify the query.
select title, rating
from film
where rating in ('PG', 'G', 'PG-13');

3.19 Write a query equivalent to the one below using BETWEEN.

select title, length
from film
where length >= 90
  and length <= 120;

Table(s) to use: film

Expected Output:
title                 |length|
----------------------|------|
AFFAIR PREJUDICE      |   117|
ALABAMA DEVIL         |   114|
DATE SPEED            |   104|
ALICE FANTASIA        |    94|
AMADEUS HOLY          |   113|
ANACONDA CONFESSIONS  |    92|
ANTHEM LUKE           |    91|
APACHE DIVINE         |    92|
...

222 rows

Show/Hide Solution

select title, length
from film
where length between 90 and 120;

3.20 Write a query to return all film titles that end with the word "GRAFFITI"

Table(s) to use: film

Expected Output:
title             |
------------------|
CLUB GRAFFITI     |
HUMAN GRAFFITI    |
STRANGERS GRAFFITI|

3 rows

Show/Hide Solution

The key here is to use the % wildcard at the start of the pattern string, as any text can come before the word 'GRAFFITI'. But not at the end of the pattern string.
select title
from film
where title like '%GRAFFITI';

3.21 In exercise 3.17 you wrote a query to list the films that have a rating that is not 'G' or 'PG'. Re-write this query using NOT IN. Do your results include films with a NULL rating?

Table(s) to use: film

Show/Hide Solution

You can re-write the query as below. Your results will not include films with a NULL rating, so as in exercise 3.17 you should be careful when working with a column that could contain NULL values and put in explicit checks if you want them included in the output.
select title, rating
from film
where rating not in ('G', 'PG');

3.22 Write a query to list all the customers with an email address. Order the customers by last name descending

Table(s) to use: customer

Expected Output:
first_name |last_name   |
-----------|------------|
CYNTHIA    |YOUNG       |
MARVIN     |YEE         |
LUIS       |YANEZ       |
BRIAN      |WYMAN       |
BRENDA     |WRIGHT      |
TYLER      |WREN        |
FLORENCE   |WOODS       |
LORI       |WOOD        |
...

597 rows

Show/Hide Solution

We first want to only include customers with an email address (email is not null). Then we order by last_name descending.
select first_name, last_name
from customer
where email is not null
order by last_name desc;

3.23 Write a query to list the country id's and cities from the city table, first ordered by country id ascending, then by city alphabetically.

Table(s) to use: city

Expected Output:
country_id|city                      |
----------|--------------------------|
         1|Kabul                     |
         2|Batna                     |
         2|Bchar                     |
         2|Skikda                    |
         3|Tafuna                    |
         4|Benguela                  |
         4|Namibe                    |
         5|South Hill                |
         6|Almirante Brown           |
         6|Avellaneda                |
         6|Baha Blanca               |
         6|Crdoba                    |
         6|Escobar                   |
         6|Ezeiza                    |
...

600 rows

Show/Hide Solution

Because we are sorting both columns in ascending order, which is the default sort order, we can leave off adding ASC.
select country_id, city
from city
order by country_id, city;

3.24 Write a query to list actors ordered by the length of their full name ("[first_name] [last_name]") descending.

Table(s) to use: actor

Expected Output:
full_name           |len|
--------------------|---|
MICHELLE MCCONAUGHEY| 20|
JOHNNY LOLLOBRIGIDA | 19|
ANGELA WITHERSPOON  | 18|
MATTHEW JOHANSSON   | 17|
CAMERON ZELLWEGER   | 17|
JAYNE SILVERSTONE   | 17|
CHRISTOPHER BERRY   | 17|
FRANCES DAY-LEWIS   | 17|
...

200 rows

Show/Hide Solution

You can create a computed column in your SELECT statement for the length of each actor's full name and then refer to that alias in the ORDER BY clause. You may be wondering why you couldn't just write length(full_name) for the second column in your select clause - ie. why did you have to repeat "first_name || ' ' || last_name"? This is because the second derived column is not guaranteed to be evaluated after the first. There is no guaranteed order of evalution of expressions in the SELECT clause! (and other clauses in general too)
select
  first_name || ' ' || last_name as full_name,
  length(first_name || ' ' || last_name) as len
from actor
order by len desc;

3.25 Describe the difference between ORDER BY x, y DESC and ORDER BY x DESC, y DESC (where x and y are columns in some imaginary table you're querying)

Table(s) to use: imaginary!

Show/Hide Solution

ORDER BY x, y DESC is equivalent to ORDER BY x ASC, y DESC - order first by x ascending, then by y descending. This is different from ORDER BY x DESC, y DESC - order by x descending, then by y descending. When asc/desc is omitted, ascending is the default.

3.26 Fix the query below, which we wanted to use to list all the rentals that happened after 10pm at night.

P.S. Don't be intimated by the date_part function which you may not have seen so far. It's just another function which you read about in the official documentation
select rental_id, date_part('hour', rental_date) as "rental hour"
from rental
where "rental hour" >= 22;

Table(s) to use: rental

Expected Output:
rental_id|rental hour|
---------|-----------|
        2|         22|
        3|         23|
        4|         23|
        5|         23|
        6|         23|
        7|         23|
        8|         23|
      136|         22|
...

1252 rows

Show/Hide Solution

You can't refer to the "rental hour" alias in the where clause, so instead just repeat the expression!
select rental_id, date_part('hour', rental_date) as "rental hour"
from rental
where date_part('hour', rental_date) >= 22;

3.27 Write a query to return the 3 most recent payments received

Table(s) to use: payment

Expected Output:
payment_id|payment_date       |
----------|-------------------|
     31918|2007-05-14 13:44:29|
     31917|2007-05-14 13:44:29|
     31919|2007-05-14 13:44:29|

3 rows

Show/Hide Solution

You'll first need to order the payments by payment date descending, which will ensure that the first rows are the most recent. Then use the limit keyword to return only the first 3.
select payment_id, payment_date
from payment
order by payment_date desc
limit 3;

3.28 Return the 4 films with the shortest length that are not R rated. For films with the same length, order them alphabetically

Table(s) to use: film

Expected Output:
title           |length|rating|
----------------|------|------|
ALIEN CENTER    |    46|NC-17 |
IRON MOON       |    46|PG    |
KWAI HOMEWARD   |    46|PG-13 |
LABYRINTH LEAGUE|    46|PG-13 |

4 rows

Show/Hide Solution

This was a tricky one. When building the WHERE clause we need to make sure to include films that have a NULL rating (technically they are not 'R' rated!). Also, we need to order first by length then by title to obtain alphabetical order for films of the same length. Remember, LIMIT is the very last clause to be processed.
select title, length, rating
from film
where rating != 'R'
  or rating is null
order by length, title
limit 4;

3.29 Write a query to return the last 3 payments made in January, 2007

Table(s) to use: payment

Expected Output:
payment_id|amount|payment_date       |
----------|------|-------------------|
     16813|  0.99|2007-01-31 21:16:11|
     16890|  0.99|2007-01-31 21:06:00|
     17173|  2.99|2007-01-31 20:45:37|

3 rows

Show/Hide Solution

You might have initially forgotten the second part of the WHERE, to make sure the payment date must be in January. You need to do this because there might be less than 3 payments made in January, and if there were, your query would actually return payments outside of January.
select payment_id, amount, payment_date
from payment
where payment_date < '2007-02-01'
  and payment_date >= '2007-01-01'
order by payment_date desc
limit 3;

3.30 Can you think of a way you could, as in the previous exercise, return the last 3 payments made in January, 2007 but have those same 3 output rows ordered by date ascending? (Don't spend too long on this...)

Table(s) to use: payment

Expected Output:
payment_id|amount|payment_date       |
----------|------|-------------------|
     17173|  2.99|2007-01-31 20:45:37|
     16890|  0.99|2007-01-31 21:06:00|
     16813|  0.99|2007-01-31 21:16:11|

3 rows

Show/Hide Solution

You don't yet have the tools to do this and it's a bit of a problem with using ORDER BY/LIMIT while wanting separate output row ordering. I'll show you how to handle situations like this in a future video.

3.31 Write a query to return all the unique ratings films can have, ordered alphabetically (not including NULL)

Table(s) to use: film

Expected Output:
rating|
------|
G     |
PG    |
PG-13 |
R     |
NC-17 |

5 rows

Show/Hide Solution

select distinct rating
from film
where rating is not null
order by rating;

3.32 Write a query to help us quickly see if there is any hour of the day that we have never rented a film out on (maybe the staff always head out for lunch?)

Table(s) to use: rental

Show/Hide Solution

You can use the date_part('hour', rental_date) function to obtain the hour part for each rental date/time. Combined with DISTINCT and ORDER BY you obtain a simple list you can scroll through and see that no, there's nothing to be alarmed about here...all hours of the day are accounted for.
select distinct date_part('hour', rental_date) as hr
from rental
order by hr;

3.33 Write a query to help quickly check whether the same rental rate is used for each rental duration (for example - is the rental rate always 4.99 when the rental duration is 3?)

Table(s) to use: film

Expected Output:
rental_duration|rental_rate|
---------------|-----------|
              3|       4.99|
              3|       2.99|
              3|       0.99|
              4|       2.99|
              4|       4.99|
              4|       0.99|
              5|       2.99|
              5|       4.99|
...

15 rows

Show/Hide Solution

This query will return all unique combinations of rental_duration and rental_rate and we can see clearly now that the answer is no - many different rental rates can be used for the same rental duration.
select distinct rental_duration, rental_rate
from film
order by rental_duration;

3.34 Can you explain why the first query below works, but the second one, which simply adds the DISTINCT keyword, doesn't? (this is quite challenging)

select first_name
from actor
order by last_name;
select distinct first_name
from actor
order by last_name;

Table(s) to use: actor

Show/Hide Solution

In the second query, multiple rows of actors are combined in to a single row due to the use of DISTINCT. For example, there are two actors with the first name ADAM (ADAM HOPPER and ADAM GRANT), however after the SELECT DISTINCT clause has been processed, there is only one row with first name ADAM. Ordering then by last name is undefined - eg. In the case of ADAM, Postgres has no way to know which last name should be used (HOPPER or GRANT?). In general, avoid ordering by columns you haven't selected and you can sidestep complex situations like this.

3.35 Write a query to return an ordered list of distinct ratings for films in our films table along with their descriptions (you will have to type in the descriptions yourself)

Table(s) to use: film

Expected Output:
rating|rating description           |
------|-----------------------------|
G     |General                      |
PG    |Parental Guidance Recommended|
PG-13 |Parents Strongly Cautioned   |
R     |Restricted                   |
NC-17 |Adults Only                  |

5 rows

Show/Hide Solution

This exercise combines quite a few things we've learned so far. We use DISTINCT, IS NOT NULL, and ORDER BY to obtain a unique ordered list of all the different film ratings. Then we use the simple form of the case expression to convert each rating to a description.
select distinct rating,
  case rating
    when 'G' then 'General'
    when 'PG' then 'Parental Guidance Recommended'
    when 'PG-13' then 'Parents Strongly Cautioned'
    when 'R' then 'Restricted'
    when 'NC-17' then 'Adults Only'
  end as "rating description"
from film
where rating is not null
order by rating;

3.36 Write a query to output 'Returned' for returned rentals and 'Not Returned' for rentals that haven't been returned. Order the output to show those not returned first.

Table(s) to use: rental

Expected Output:
rental_id|rental_date        |return_date        |return_status|
---------|-------------------|-------------------|-------------|
    14503|2006-02-14 15:16:03|                   |Not Returned |
    14172|2006-02-14 15:16:03|                   |Not Returned |
    12988|2006-02-14 15:16:03|                   |Not Returned |
    14488|2006-02-14 15:16:03|                   |Not Returned |
    15655|2006-02-14 15:16:03|                   |Not Returned |
    14204|2006-02-14 15:16:03|                   |Not Returned |
    12130|2006-02-14 15:16:03|                   |Not Returned |
    12970|2006-02-14 15:16:03|                   |Not Returned |
...

16044 rows

Show/Hide Solution

Here you can use the searched form of the case expression to check when the return date is NULL and output the appropriate status.
select rental_id, rental_date, return_date,
  case
    when return_date is null then 'Not Returned'
    else 'Returned'
  end as return_status
from rental
order by return_status;

3.37 Imagine you were asked to write a query to populate a 'country picker' for some internal company dashboard. Write a query to return the countries in alphabetical order, but also with the twist that the first 3 countries in the list must be 1) Australia 2) United Kingdom 3) United States and then normal alphabetical order after that (maybe you want them first because, for example, most of your customers come from these countries)

Table(s) to use: country

Expected Output:
country                              |
-------------------------------------|
Australia                            |
United Kingdom                       |
United States                        |
Afghanistan                          |
Algeria                              |
American Samoa                       |
Angola                               |
Anguilla                             |
...

109 rows

Show/Hide Solution

This was a tricky one so study the solution carefully and make sure you understand it. You can use a case statement in the ORDER BY clause to return a number for particular countries, which is then used for ordering. So Australia becomes 0, United Kingdom becomes 1, United States becomes 2 and every other country gets the result 3. ORDER BY first orders by these numbers, in ascending order. There is then a second order by term (the ", country" bit after the case statement ends) which then ensures that for all those countries where the result was 3, they too are ordered alphabetically.
select country
from country
order by
  case country
    when 'Australia' then 0
    when 'United Kingdom' then 1
    when 'United States' then 2
    else 3
  end, country;

3.38 We want to give a prize to 5 random customers each month. Write a query that will return 5 random customers each time it is run (you may find a particular math function helpful - make sure to search the documentation!)

Table(s) to use: customer

Expected Output:Your output should list a different set of customers each time you run the query!
first_name|last_name|email                              |
----------|---------|-----------------------------------|
DUANE     |TUBBS    |DUANE.TUBBS@sakilacustomer.org     |
OLGA      |JIMENEZ  |OLGA.JIMENEZ@sakilacustomer.org    |
ROSEMARY  |SCHMIDT  |ROSEMARY.SCHMIDT@sakilacustomer.org|
PAMELA    |BAKER    |PAMELA.BAKER@sakilacustomer.org    |
BILL      |GAVIN    |BILL.GAVIN@sakilacustomer.org      |

5 rows

Show/Hide Solution

The random() function, which generates a random value in the range 0.0 <= x < 1.0, is useful for this behaviour and can be used in the ORDER BY clause to achieve a random ordering of customers.
select first_name, last_name, email
from customer
order by random()
limit 5;

3.39 Give 3 different solutions to list the rentals made in June, 2005. In one solution, use the date_part function. In another, use the BETWEEN keyword. In the third, don't use either date_part or BETWEEN.

Table(s) to use: rental

Expected Output:
rental_id|rental_date        |
---------|-------------------|
     1158|2005-06-14 22:53:33|
     1159|2005-06-14 22:55:13|
     1160|2005-06-14 23:00:34|
     1161|2005-06-14 23:07:08|
     1162|2005-06-14 23:09:38|
     1163|2005-06-14 23:12:46|
     1164|2005-06-14 23:16:26|
     1165|2005-06-14 23:16:27|
...

2311 rows

Show/Hide Solution

Using the date_part function.
select rental_id, rental_date
from rental
where date_part('year', rental_date) = 2005
  and date_part('month', rental_date) = 6;
Using the BETWEEN keyword.
select rental_id, rental_date
from rental
where rental_date between
  '2005-06-01 00:00:00' and '2005-06-30 24:00:00';
Instead of BETWEEN you can also use the standard greater than and less than operators.
select rental_id, rental_date
from rental
where rental_date >= '2005-06-01 00:00:00'
  and rental_date <= '2005-06-30 24:00:00';

3.40 Return the top 5 films for $ per minute (rental_rate / length) of entertainment

Table(s) to use: film

Expected Output:
title              |rental_rate|length|per_minute            |
-------------------|-----------|------|----------------------|
IRON MOON          |       4.99|    46|0.10847826086956521739|
HANOVER GALAXY     |       4.99|    47|0.10617021276595744681|
MIDSUMMER GROUNDHOG|       4.99|    48|0.10395833333333333333|
ACE GOLDFINGER     |       4.99|    48|0.10395833333333333333|
PELICAN COMFORTS   |       4.99|    48|0.10395833333333333333|

5 rows

Show/Hide Solution

Make sure to first filter out films where the length is NULL or 0 (to avoid division by zero errors).
select title, rental_rate, length,
  rental_rate / length as per_minute
from film
where length is not null
  and length != 0
order by per_minute desc
limit 5;

3.41 Write a query to list all customers who have a first name containing the letter 'A' twice or more

Table(s) to use: customer

Expected Output:
first_name|
----------|
PATRICIA  |
BARBARA   |
MARIA     |
MARGARET  |
SANDRA    |
LAURA     |
SARAH     |
ANGELA    |
...

56 rows

Show/Hide Solution

You can use the LIKE or ILIKE keyword - whichever you prefer (though LIKE is standard SQL while ILIKE is specific to PostgreSQL). And we make use of the % wildcard here to represent a string of any length (including the empty string)
select first_name
from customer
where first_name ilike '%a%a%';

3.42 PostgreSQL supports an interesting variation of DISTINCT called DISTINCT ON. Visit the official documentation page and read about DISTINCT ON. See if you can figure out how you would use it in a query to return the most recent rental for each customer

Table(s) to use: rental

Expected Output:
customer_id|rental_date        |
-----------|-------------------|
          1|2005-08-22 20:03:46|
          2|2005-08-23 17:39:35|
          3|2005-08-23 07:10:14|
          4|2005-08-23 07:43:00|
          5|2006-02-14 15:16:03|
          6|2005-08-23 06:41:32|
          7|2005-08-21 04:49:48|
          8|2005-08-23 14:31:19|
          9|2006-02-14 15:16:03|
         10|2005-08-22 21:59:29|
...

599 rows

Show/Hide Solution

SELECT DISTINCT ON keeps only the first row of each set of rows where the given expressions evaluate to equal. For this query, where the output is ordered first by customer ID ascending and then by rental date descending, the DISTINCT ON clause results in us keeping the first encountered row for each unique customer ID - which is the most recent rental date. This is extremely useful!
select distinct on (customer_id) customer_id, rental_date
from rental
order by customer_id asc, rental_date desc;

3.43 Write a query to list all the customers with an email address but not in the format [first_name].[last_name]@sakilacustomer.org

Table(s) to use: customer

Expected Output:
first_name|last_name|email                           |
----------|---------|--------------------------------|
JEAN      |BELL     |JEAN.BELL@sakilacustomer.or     |
BONNIE    |HUGHES   |BONNIE.HUGHES@sabilacustomer.org|

2 rows

Show/Hide Solution

You may be wondering why I used parentheses around the full email string. In this case, it's purely for readability and clarity. The query will work exactly the same without the parentheses but I add them so it's clear that I'm checking the email address against the full concatenated string and not just part of it.
select first_name, last_name, email
from customer
where email != (first_name || '.' || last_name || '@sakilacustomer.org')

4. Aggregate functions and grouping

4.1 Write a query to return the total count of customers in the customer table and the count of how many customers provided an email address

Table(s) to use: customer

Expected Output:
# customers|# customers with email|
-----------|----------------------|
        599|                   597|

1 row

Show/Hide Solution

SELECT COUNT(*) will return the total number of rows, while COUNT(expression) will return the number of rows where expression is not NULL.
select
  count(*) as "# customers",
  count(email) as "# customers with email"
from customer;

4.2 Building on the previous exercise, now return an additional result showing the percentage of customers with an email address (as a helpful hint, if you're getting 0 try multiplying the fraction by 100.0 - we'll examine why this is necessary in an upcoming chapter on data types)

Table(s) to use: customer

Expected Output:
# customers|# customers with email|% with email       |
-----------|----------------------|-------------------|
        599|                   597|99.6661101836393990|

1 row

Show/Hide Solution

You can obtain the percentage by dividing the two terms and multiplying by 100.0
select
  count(*) as "# customers",
  count(email) as "# customers with email",
  100.0 * count(email) / count(*) as "% with email"
from customer;

4.3 Write a query to return the number of distinct customers who have made payments

Table(s) to use: payment

Expected Output:
count|
-----|
  599|

1 row

Show/Hide Solution

You can use DISTINCT inside the COUNT function to count the number of distinct customer IDs.
select count(distinct customer_id)
from payment;

4.4 What is the average length of time films are rented out for

Table(s) to use: rental

Expected Output:
avg rental duration   |
----------------------|
4 days 24:36:28.541706|

1 row

Show/Hide Solution

select avg(return_date - rental_date) as "avg rental duration"
from rental;

4.5 Write a query to return the sum total of all payment amounts received

Table(s) to use: payment

Expected Output:
total   |
--------|
67416.51|

1 row

Show/Hide Solution

select sum(amount) as total
from payment;

4.6 List the number of films each actor has appeared in and order the results from most popular to least

Table(s) to use: film_actor

Expected Output:
actor_id|num_films|
--------|---------|
     107|       42|
     102|       41|
     198|       40|
     181|       39|
      23|       37|
      81|       36|
     158|       35|
      60|       35|
...

200 rows

Show/Hide Solution

Each row in the film_actor table identifies a single film an actor is in. So simply counting the number of rows grouped by actor gives us the count of films the actor has been in (we don't have to worry about duplicates due to the table structure).
select actor_id, count(*) as num_films
from film_actor
group by actor_id
order by num_films desc;

4.7 List the customers who have made over 40 rentals

Table(s) to use: rental

Expected Output:
customer_id|
-----------|
         75|
        144|
        526|
        236|
        148|

5 rows

Show/Hide Solution

You can use HAVING to filter the groups to only return those customers with more than 40 rentals.
select customer_id
from rental
group by customer_id
having count(*) > 40;

4.8 We want to compare how the staff are performing against each other on a month to month basis. So for each month and year, show for each staff member how many payments they handled, the total amount of money they accepted, and the average payment amount

Table(s) to use: payment

Expected Output:
year|month|staff_id|num_payments|payment_total|avg_payment_amount|
----|-----|--------|------------|-------------|------------------|
2007|    1|       1|         617|      2621.83|4.2493192868719611|
2007|    1|       2|         540|      2202.60|4.0788888888888889|
2007|    2|       1|        1164|      4776.36|4.1034020618556701|
2007|    2|       2|        1148|      4855.52|4.2295470383275261|
2007|    3|       1|        2817|     11776.83|4.1806283280085197|
2007|    3|       2|        2827|     12109.73|4.2835974531305271|
2007|    4|       1|        3364|     14080.36|4.1856004756242568|
2007|    4|       2|        3390|     14479.10|4.2711209439528024|
2007|    5|       1|          95|       234.09|2.4641052631578947|
2007|    5|       2|          87|       280.09|3.2194252873563218|

10 rows

Show/Hide Solution

This was a difficult one and probably the biggest query you've written so far. Give yourself a big pat on the back if you managed to work it out. Here we use the date_part function to allow us to extract the month and year as separate components from the payment date. And our group by clause includes 3 elements - the year, month, and staff id.
select
  date_part('year', payment_date) as year,
  date_part('month', payment_date) as month,
  staff_id,
  count(*) as num_payments,
  sum(amount) as payment_total,
  avg(amount) as avg_payment_amount
from payment
group by
  date_part('year', payment_date),
  date_part('month', payment_date),
  staff_id
order by year, month, staff_id;

4.9 Write a query to show the number of rentals that were returned within 3 days, the number returned in 3 or more days, and the number never returned (for the logical comparison check you can use the following code snippet to compare against an interval: where return_date - rental_date < interval '3 days')

Table(s) to use: rental

Expected Output:
lt 3 days|gt 3 days|never returned|
---------|---------|--------------|
     4388|    11473|           183|

1 row

Show/Hide Solution

For this solution we use the PostgreSQL specific filter keyword vs the standard case version, but both are equally valid approaches.
select
  count(*) filter
    (where return_date - rental_date < interval '3 days') as "lt 3 days",
  count(*) filter
    (where return_date - rental_date >= interval '3 days') as "gt 3 days",
  count(*) filter
    (where return_date is null) as "never returned"
from rental;

4.10 Write a query to give counts of the films by their length in groups of 0 - 1hrs, 1 - 2hrs, 2 - 3hrs, and 3hrs+ (note: you might get slightly different numbers if doing inclusive or exclusive grouping - but don't sweat it!)

Table(s) to use: film

Expected Output:
len   |count|
------|-----|
0-1hrs|   97|
1-2hrs|  436|
2-3hrs|  419|
3hrs+ |   48|

4 rows

Show/Hide Solution

Here we use the CASE expression in our GROUP BY clause and also take advantage of the convenient numerical positioning reference to avoid repeating the full CASE expression in both the GROUP BY and SELECT clause (notice we can even use that numerical positioning approach in our ORDER BY clause too!)
select
  case
    when length between 0 and 59 then '0-1hrs'
    when length between 60 and 119 then '1-2hrs'
    when length between 120 and 179 then '2-3hrs'
    else '3hrs+'
  end as len,
  count(*)
from film
group by 1
order by 1;

4.11 Explain why in the following query we obtain two different results for the average film length. Which one is correct?

select
  1.0 * sum(length) / count(*) as avg1,
  1.0 * avg(length) as avg2
from film;

Table(s) to use: film

Expected Output:
avg1                |avg2                 |
--------------------|---------------------|
114.8850000000000000|115.11523046092184370|

Show/Hide Solution

When using the avg aggregate function, NULL lengths are ignored in the calculation. However, the count(*) term in the first selected column will count all the rows, even those with a NULL length. Technically either approach could be correct, depending on how you want to treat NULL length films. Most people would probably argue though that here, the second approach is correct.

4.12 Write a query to return the average rental duration for each customer in descending order

Table(s) to use: rental

Expected Output:
customer_id|avg_rent_duration     |
-----------|----------------------|
        315|     6 days 14:13:22.5|
        187|5 days 34:58:38.571428|
        321|5 days 32:56:32.727273|
        539|5 days 31:39:57.272727|
        436|       5 days 31:09:46|
        532|5 days 30:59:34.838709|
        427|       5 days 29:27:05|
        555|5 days 26:48:35.294118|
...

599 rows

Show/Hide Solution

select
  customer_id,
  avg(return_date - rental_date) as avg_rent_duration
from rental
group by customer_id
order by avg_rent_duration desc;

4.13 Return a list of customer where all payments they’ve made have been over $2 (lookup the bool_and aggregate function which will be useful here)

Table(s) to use: payment

Expected Output:
customer_id|
-----------|
        363|
         59|

2 rows

Show/Hide Solution

This was a tricky one. The key here was to realize bool_and is just like the other aggregate functions you've already seen in that it combines multiple rows in to a single output, but unlike all the other aggregate functions you've seen bool_and does this by performing a logical AND between each input expression.
select customer_id
from payment
group by customer_id
having bool_and(amount > 2);

4.14 As a final fun finish to this chapter, run the following query to see a cool way you can generate ascii histogram charts. Look up the repeat function (you'll find it under 'String Functions and Operators') to see how it works and change the output character...and don't worry, I'll explain the ::int bit in the next chapter!

select rating, repeat('*', (count(*) / 10)::int) as "count/10"
from film
where rating is not null
group by rating;

Table(s) to use: film

Expected Output:
rating|count/10              |
------|----------------------|
PG-13 |**********************|
R     |*******************   |
G     |*****************     |
PG    |*******************   |
NC-17 |********************  |

5 rows

Show/Hide Solution

The repeat function is used to repeat the '*' character count/10 times for each group (I divided by 10 just to keep the output concise!)

5. Understanding data types

5.1 Write a query to print a description of each film's length as shown in the output below. When a film does not have a length, print: [title] is unknown length

Table(s) to use: film

Expected Output:
length_desc                               |
------------------------------------------|
ACADEMY DINOSAUR is 86 minutes            |
ACE GOLDFINGER is 48 minutes              |
ADAPTATION HOLES is 50 minutes            |
AFFAIR PREJUDICE is 117 minutes           |
AFRICAN EGG is 130 minutes                |
AGENT TRUMAN is 169 minutes               |
AIRPLANE SIERRA is 62 minutes             |
AIRPORT POLLOCK is 54 minutes             |
...

1000 rows

Show/Hide Solution

There are a couple of different ways you could accomplish this. The coalesce function is a popular way to substitute in a default value when some expression is NULL (in this case, if the length is NULL, the entire expression length || ' minutes' will evaluate to NULL).
select
  title || ' is ' || coalesce(length || ' minutes', 'unknown length') as length_desc 
from film;

5.2 You want to play a movie title guessing game with some friends. Write a query to print only the first 3 letters of each film title and then '*' for the rest (The repeat function may come in handy here...)

Table(s) to use: film

Expected Output:
Guess!                     |
---------------------------|
ACA*************           |
ACE***********             |
ADA*************           |
AFF*************           |
AFR********                |
AGE*********               |
AIR************            |
AIR************            |
...

1000 rows

Show/Hide Solution

You can first use the left function to obtain the first 3 characters of the film title, and then concatenate that with '*' repeated (length - 3) times to fill out the remaining characters in the title.
select left(title, 3) || repeat('*', length(title) - 3) as "Guess!"
from film;

5.3 Write a query to list the percentage of films that are rated NC-17, G, PG, PG-13, NC-17, and R, rounded to the nearest integer.

Table(s) to use: film

Expected Output:
% NC-17|% PG|% G|% R|% PG-13|
-------|----|---|---|-------|
     21|  19| 18| 20|     22|

1 row

Show/Hide Solution

To calculate the percentage, you can use either the technique of using a CASE expression inside a SUM/COUNT aggregation or alternatively the COUNT(*) FILTER variation available in Postgres. The ratio is multiplied by 100.0 to ensure the calculation is done as a numeric and we avoid integer division. Finally, the round function is used to round each percentage to the nearest integer.
select
  round(100.0 * count(*) filter(where rating = 'NC-17') / count(*)) as "% NC-17",
  round(100.0 * count(*) filter(where rating = 'PG') / count(*)) as "% PG",
  round(100.0 * count(*) filter(where rating = 'G') / count(*)) as "% G",
  round(100.0 * count(*) filter(where rating = 'R') / count(*)) as "% R",
  round(100.0 * count(*) filter(where rating = 'PG-13') / count(*)) as "% PG-13"
from film; 

5.4 Try a few of the different explicit casting operations listed below to get familiar with how casting behaves. Was the behaviour what you expected?

select int '33';
select int '33.3';
select cast(33.3 as int);
select cast(33.8 as int);
select 33::text;
select 'hello'::varchar(2);
select cast(35000 as smallint);
select 12.1::numeric(1,1);

Show/Hide Solution

There's a few interesting things to point out here. First, note that the second query fails - when you're using the decorated literal form, the literal must exactly match the target type. For the third and fourth queries when 33.3 and 33.8 are cast to int, both succeed and round up or down to the nearest integer. Casting a number to text works just fine as does casting some literal text to a varchar, but the result is truncated to as many characters as the varchar can fit. The final two queries fail - 35,000 does not fit within a smallint and likewise 12.1 does not fit within a numeric(1,1).

5.5 Show 3 different ways to input the timestamptz representing 4th March, 2019 at 3:30pm in New York, USA

Run the following queries to see the full list of timezones available in Postgres.
select *
from pg_timezone_names;
select *
from pg_timezone_abbrevs;

Show/Hide Solution

You can specify the timezone in several different ways as shown below. Additionally, you can specify the time part in either 24-hour time or in 12-hour time with AM/PM.
select
  timestamptz '2019-03-04 15:30 EST',
  timestamptz '2019-03-04 03:30PM -5',
  timestamptz '2019-03-04 03:30PM America/New_York';

5.6 The rental duration in the film table is currently stored as an integer, representing the number of days. Write a query to return this as an interval instead and then add one day to the duration

Table(s) to use: film

Expected Output:
title                      |duration|duration + 1|
---------------------------|--------|------------|
ACADEMY DINOSAUR           |  6 days|      7 days|
ACE GOLDFINGER             |  3 days|      4 days|
ADAPTATION HOLES           |  7 days|      8 days|
AFFAIR PREJUDICE           |  5 days|      6 days|
AFRICAN EGG                |  6 days|      7 days|
AGENT TRUMAN               |  3 days|      4 days|
AIRPLANE SIERRA            |  6 days|      7 days|
AIRPORT POLLOCK            |  6 days|      7 days|
...

1000 rows

Show/Hide Solution

We need to construct the interval from an integer and we can do this by concatenating the number of days with ' days' and then casting that text expression as an interval.
select
  title,
  cast(rental_duration || ' days' as interval)  as duration,
  cast(rental_duration || ' days' as interval) + interval '1 day'  as "duration + 1"
from film;

5.7 You have a theory that certain hours of the day might be busiest for rentals. To investigate this write a query to list out for all time the the number of rentals made during each hour of the day

Table(s) to use: rental

Expected Output:
hr|count|
--|-----|
 0|  694|
 1|  649|
 2|  630|
 3|  684|
 4|  681|
 5|  648|
 6|  647|
 7|  667|
...

24 rows

Show/Hide Solution

You can use the date_part function to obtain the hour component for each rental date and then group by and order by that expression.
select 
  date_part('hour', rental_date) as hr,
  count(*)
from rental
group by hr
order by hr;

5.8 If you wanted to aggregate payments received by year and month you could write a query as below using the date_part function. Try and simplify this query by instead using the date_trunc function to achieve effectively the same result (ignoring the slight difference in date presentation)

select
  date_part('year', payment_date) as "year",
  date_part('month', payment_date) as "month",
  sum(amount) as total
from payment
group by "year", "month"
order by "year", "month";

Table(s) to use: payment

Expected Output:
month              |total   |
-------------------|--------|
2007-01-01 00:00:00| 4824.43|
2007-02-01 00:00:00| 9631.88|
2007-03-01 00:00:00|23886.56|
2007-04-01 00:00:00|28559.46|
2007-05-01 00:00:00|  514.18|

5 rows

Show/Hide Solution

You can use the date_trunc function to truncate each payment date to the month component and then group by and order by that expression. Your output date format will be slightly different vs the initial query but in many cases this date format which represents a full date and time is even preferable for being plugged in to external data visualization tools (like Excel, etc.).
select
  date_trunc('month', payment_date) as "month",
  sum(amount) as total
from payment
group by "month"
order by "month";

5.9 Write a query to return a count of the number of films that were rented out on the last day of a month

Table(s) to use: rental

Expected Output:
total # EOM rentals|
-------------------|
                842|

Show/Hide Solution

There are a couple of ways you could go about solving this one. You could use the date_part function to extract the year and month and check for known pairs (31 days in January, etc) but you would have to be careful to account for leap years too which is tricky to handle! The solution below by constrast is quite elegant - we can truncate the rental date to the month and then make use of interval math to obtain the last day of the month and then compare that to the truncated rental date (to remove the time component).
select count(*) as "total # EOM rentals"
from rental
where date_trunc('month', rental_date) + interval '1 month' - interval '1 day'
        = date_trunc('day', rental_date);

5.10 Write a query to list the film titles that have spaces at the beginning or end (we want to flag them so we know to clean them up!)

Table(s) to use: film

Expected Output:

0 rows

Show/Hide Solution

There aren't any! But it's always good to check just in case. One way to approach this is to use the trim function, which by default will remove both leading and trailing whitespace and then compare the length of that to the original title length. If some whitespace was removed, the length difference will be greater than 0.
select title
from film
where length(title) - length(trim(title)) > 0;

5.11 Write a query to sum up, for each customer, the total number of hours they have had films rented out for. Return only the top 3 customers with the most hours.

Table(s) to use: rental

Expected Output:
customer_id|hrs_rented|
-----------|----------|
        526|      6340|
        148|      5834|
        144|      5641|

3 rows

Show/Hide Solution

select
  customer_id,
  round(sum(date_part('epoch', return_date - rental_date)) / 3600) as hrs_rented
from rental
group by customer_id
order by hrs_rented desc
limit 3;

5.12 Postgres has a really useful function called generate_series which will come in handy in a few of the coming chapters. Have a look at the examples how to use it here and then write a query to generate a list of timestamps which represent the first day of every month in 2019, at 5pm UTC

Expected Output:
generate_series    |
-------------------|
2019-01-02 03:00:00|
2019-02-02 03:00:00|
2019-03-02 03:00:00|
2019-04-02 03:00:00|
2019-05-02 03:00:00|
2019-06-02 03:00:00|
2019-07-02 03:00:00|
2019-08-02 03:00:00|
2019-09-02 03:00:00|
2019-10-02 03:00:00|
2019-11-02 03:00:00|
2019-12-02 03:00:00|

12 rows

Show/Hide Solution

generate_series takes 3 arguments - the start, stop, and step. Our start timestamp is set to the 1st Jan 2019 at 5pm UTC, our end timestamp is set to 1st Dec 2019 at 5pm UTC, and the step is set to 1 month. Generated tables like this are very useful to join on to for analytics and reporting type queries. Note that your output will look different to mine depending on your local timezone - remember that any internal timestamps with time zone information are converted to your timezone for display purposes.
select *
from generate_series(
  timestamptz '2019-01-01 17:00 UTC',
  timestamptz '2019-12-01 17:00 UTC',
  interval '1 month');

5.13 Return a count of the number of occurrences of the letter 'A' in each customer's first name (this is a common interview question for SQL related jobs!). Order the output by the count descending.

Table(s) to use: customer

Expected Output:
first_name |count|
-----------|-----|
CASSANDRA  |    3|
BARBARA    |    3|
SAMANTHA   |    3|
TAMARA     |    3|
AMANDA     |    3|
MARSHALL   |    2|
MARSHA     |    2|
MARIAN     |    2|
...

599 rows

Show/Hide Solution

Similar to the exercise involving removing whitespace, in this case you can use the replace function to remove any instances of the letter 'A' (by replacing 'A' with a blank space - nothing). Subtracting the length of this expression from the length of the unedited first name then yields you the number of occurences of 'A'.
select
  first_name,
  length(first_name) - length(replace(first_name, 'A', '')) as count
from customer
order by count desc;

5.14 Write a query to tally up the total amount of money made on weekends (Saturday and Sunday)

Table(s) to use: payment

Expected Output:
total $ |
--------|
19036.04|

1 row

Show/Hide Solution

You can use either the 'dow' or 'isodow' options with the date_part function to obtain a numerical representation of the day of the week. When using 'isodow', 6 represents Saturday and 7 represents Sunday. Handy!
select sum(amount) as "total $"
from payment
where date_part('isodow', payment_date) in (6, 7);

6. Joining data from multiple tables

6.1 Write a query to return a list of all the films rented by PETER MENARD showing the most recent first

Table(s) to use: rental, customer, inventory, film

Expected Output:
rental_date        |title                |
-------------------|---------------------|
2005-08-23 18:43:31|OCTOBER SUBMARINE    |
2005-08-22 09:10:21|PRIDE ALAMO          |
2005-08-21 08:40:56|FUGITIVE MAGUIRE     |
2005-08-20 12:53:46|BRIGHT ENCOUNTERS    |
2005-08-19 10:06:53|REDEMPTION COMFORTS  |
2005-08-19 09:42:01|SPOILERS HELLFIGHTERS|
2005-08-17 07:56:22|GO PURPLE            |
2005-08-02 15:44:55|GOODFELLAS SALUTE    |
...

23 rows

Show/Hide Solution

This exercises involves quite a few joins but once that's setup it's just a matter of filtering for the right customer and ordering by the rental date.
select r.rental_date, f.title
from rental as r
  inner join customer as c
    on r.customer_id = c.customer_id
  inner join inventory as i
    on r.inventory_id = i.inventory_id
  inner join film as f
    on i.film_id = f.film_id
where c.first_name = 'PETER'
  and c.last_name = 'MENARD'
order by r.rental_date desc;

6.2 Write a query to list the full names and contact details for the manager of each store

Table(s) to use: store, staff

Expected Output:
store_id|Manager     |email                       |
--------|------------|----------------------------|
       1|Mike Hillyer|Mike.Hillyer@sakilastaff.com|
       2|Jon Stephens|Jon.Stephens@sakilastaff.com|

2 rows

Show/Hide Solution

One thing to watch out for here is the key columns are named differently in the respective tables. In the store table, the manager's staff ID is stored in the column manager_staff_id which references the staff_id column in the staff table.
select
  store.store_id,
  staff.first_name || ' ' || staff.last_name as "Manager",
  staff.email
from store
  inner join staff
    on store.manager_staff_id = staff.staff_id;

6.3 Write a query to return the top 3 most rented out films and how many times they've been rented out

Table(s) to use: rental, inventory, film

Expected Output:
film_id|title             |count|
-------|------------------|-----|
    103|BUCKET BROTHERHOOD|   34|
    738|ROCKETEER MOTHER  |   33|
    331|FORWARD TEMPLE    |   32|

3 rows

Show/Hide Solution

After setting up the joins necessary to resolve the inventory IDs in the rental table in to films you then need to setup a group by aggregation. You could just group by the film title but there's a risk with this approach where if two films have the same title you would end up counting them as one. For safety, group by the film ID and title (and since the film ID is always unique, this essentially just creates a single group for each film but also allows you to use the title in the SELECT clause).
select f.film_id, f.title, count(*)
from rental as r
  inner join inventory as i
    on r.inventory_id = i.inventory_id
  inner join film as f
    on i.film_id = f.film_id
group by f.film_id, f.title
order by count(*) desc
limit 3;

6.4 Write a query to show for each customer how many different (unique) films they've rented and how many different (unique) actors they've seen in films they've rented

Table(s) to use: rental, inventory, film, film_actor

Expected Output:
customer_id|num_films|num_actors|
-----------|---------|----------|
          1|       30|       108|
          2|       27|       118|
          3|       26|       103|
          4|       22|        89|
          5|       38|       119|
          6|       26|       110|
          7|       32|       117|
          8|       23|       109|
...

599 rows

Show/Hide Solution

After setting up all the required table joins, the key with this query is to realize you must perform a count of the distinct film IDs and actor IDs vs counting all the rows.
select
  r.customer_id,
  count(distinct f.film_id) as num_films,
  count(distinct fa.actor_id) as num_actors
from rental as r
  inner join inventory as i using (inventory_id)
  inner join film as f using (film_id)
  inner join film_actor as fa using (film_id)
group by r.customer_id
order by r.customer_id;

6.5 Re-write the query below written in the older style of inner joins (which you still encounter surprisingly often online) using the more modern style. Re-write it once using ON to establish the join condition and the second time with USING.

select film.title, language.name as "language"
from film, language
where film.language_id = language.language_id;

Table(s) to use: film, language

Expected Output:
title                      |language            |
---------------------------|--------------------|
ACADEMY DINOSAUR           |English             |
ACE GOLDFINGER             |English             |
ADAPTATION HOLES           |English             |
AFFAIR PREJUDICE           |English             |
AFRICAN EGG                |English             |
AGENT TRUMAN               |English             |
AIRPLANE SIERRA            |English             |
AIRPORT POLLOCK            |English             |
...

1000 rows

Show/Hide Solution

The USING syntax is quite nice and convenient when you know the columns you're joining on for equality are named the same in different tables.
select film.title, language.name as "language"
from film
  inner join language
    on film.language_id = language.language_id;
select film.title, language.name as "language"
from film
  inner join language using (language_id);

6.6 Write a query to list the films that are not in stock in any of the stores

Table(s) to use: film, inventory

Expected Output:
title                 |
----------------------|
SKY MIRACLE           |
KILL BROTHERHOOD      |
SISTER FREDDY         |
GLADIATOR WESTWARD    |
FLOATS GARDEN         |
APOLLO TEEN           |
CRYSTAL BREAKING      |
HOCUS FRIDA           |
...

42 rows

Show/Hide Solution

After performing a left outer join with the film and inventory tables, the only rows in the output with a NULL store ID will be those that didn't have any matching records in both tables - that is, those films not in our inventory. So we can simply filter the output for where the store ID is NULL.
select f.title
from film as f
  left outer join inventory as i
    on f.film_id = i.film_id
where i.store_id is null;

6.7 Write a query to return a count of how many of each film we have in our inventory (include all films). Order the output showing the lowest in-stock first so we know to buy more!

Table(s) to use: film, inventory

Expected Output:
title                      |count|
---------------------------|-----|
CHOCOLATE DUCK             |    0|
FRANKENSTEIN STRANGER      |    0|
APOLLO TEEN                |    0|
HOCUS FRIDA                |    0|
WAKE JAWS                  |    0|
SUICIDES SILENCE           |    0|
GUMP DATE                  |    0|
BUTCH PANTHER              |    0|
...

1000 rows

Show/Hide Solution

After joining the film and inventory tables we can then group by the film ID and perform a count of the number of rows, making sure that we're only counting, for each group, the rows where the inventory ID is not NULL (to ensure missing films added by the outer join still end up with a count of 0, not 1)
select f.title, count(i.inventory_id)
from film as f
  left outer join inventory as i
    on f.film_id = i.film_id
group by f.film_id, f.title
order by count(i.inventory_id) asc;

6.8 Write a query to return a count of the number of films rented by every customer on the 24th May, 2005. Order the results by number of films rented descending.

Table(s) to use: customer, rental

Expected Output:
customer_id|num_rented|
-----------|----------|
        130|         1|
        222|         1|
        239|         1|
        269|         1|
        333|         1|
        408|         1|
        459|         1|
        549|         1|
          1|         0|
          2|         0|
          3|         0|
          4|         0|
          5|         0|
...

599 rows

Show/Hide Solution

A key issue you may have run across when solving this exercise is if you tried to perform the date check in the WHERE clause instead of making it one of the join conditions. The problem with making it part of the WHERE clause is it would act to filter the output of the left join, removing any rows that the left join added back in. This is because rows added back in by the left join - customers with no rentals - will have a NULL rental date. If you put the date check in the WHERE clause, you would effectively be making your left join an inner join!
select
  c.customer_id,
  count(r.rental_id) as num_rented
from customer as c
  left join rental as r
    on c.customer_id = r.customer_id
    and date_trunc('day', r.rental_date) = '20050524'
group by c.customer_id
order by num_rented desc, c.customer_id;

6.9 Write a query to return how many copies of each film are available in each store, including zero counts if there are none. Order by count so we can easily see first which films need to be restocked in each store

Table(s) to use: film, store, inventory

Expected Output:
film_id|store_id|stock|
-------|--------|-----|
      2|       1|    0|
      3|       1|    0|
      5|       1|    0|
      8|       1|    0|
     13|       1|    0|
     14|       1|    0|
     14|       2|    0|
     20|       2|    0|
...

2000 rows

Show/Hide Solution

To obtain a complete list of every film and store combination, you first need to perform a cross join between the film and store tables. Only then can you left join the inventory table on to that - making sure you perform a composite join to match both the film ID and store ID.
select f.film_id, s.store_id, count(i.inventory_id) as stock
from film as f
  cross join store as s
  left join inventory as i
    on f.film_id = i.film_id
    and s.store_id = i.store_id
group by f.film_id, s.store_id
order by stock, f.film_id, s.store_id;

6.10 Have a look at the documentation for the extremely useful Postgres function generate_series and then using generate_series write a query to return a count of the number of rentals for each and every month in 2005 (don't worry too much about the output date formatting).

Table(s) to use: rental

Expected Output:
t                  |count|
-------------------|-----|
2005-01-01 00:00:00|    0|
2005-02-01 00:00:00|    0|
2005-03-01 00:00:00|    0|
2005-04-01 00:00:00|    0|
2005-05-01 00:00:00| 1156|
2005-06-01 00:00:00| 2311|
2005-07-01 00:00:00| 6709|
2005-08-01 00:00:00| 5686|
2005-09-01 00:00:00|    0|
2005-10-01 00:00:00|    0|
2005-11-01 00:00:00|    0|
2005-12-01 00:00:00|    0|

12 rows

Show/Hide Solution

Here we use generate_series to return a table consisting of 12 rows, with each row containing a timestamp representing the first day of each month in 2005. The table has the alias m, and a single column t (you're free to name your table and column whatever you want). On to this we left join the rental table, truncating the rental date to the month which allows us to perform an equality join (a timestamp truncated to the month will have it's day part set to 01, and time part set to 00:00:00 - consistent with the output from generate_series). The syntax here might have thrown you off, but the underlying idea is the same as what you've done now in quite a few past exercises!
select
  m.t,
  count(r.rental_id)
from generate_series('2005-01-01'::timestamp, '2005-12-01'::timestamp, '1 month') as m(t)
  left join rental as r
    on date_trunc('month', r.rental_date) = m.t
group by m.t;

6.11 Write a query to list the customers who rented out the film with ID 97 and then at some later date rented out the film with ID 841

Table(s) to use: rental, inventory

Expected Output:
customer_id|
-----------|
        459|
         25|

2 rows

Show/Hide Solution

This was a hard one - give yourself a big pat on the back if figured it out! This query involved a self-join on the rental table with the additional complexity that for each rental self-join, the inventory table was necessary as well to resolve the actual films being rented in each case!
select r.customer_id
from rental as r
  inner join inventory as i
    on r.inventory_id = i.inventory_id
  inner join rental as r2
    on r.customer_id = r2.customer_id
    and r2.rental_date > r.rental_date
  inner join inventory as i2
    on r2.inventory_id = i2.inventory_id
where i.film_id = 97 and i2.film_id = 841;

7. Subqueries

7.1 Write a query to return all the customers who made a rental on the first day of rentals (without hardcoding the date for the first day of rentals in your query)

Table(s) to use: rental, customer

Expected Output:
first_name|last_name  |
----------|-----------|
ANDREW    |PURDY      |
CASSANDRA |WALTERS    |
CHARLOTTE |HUNTER     |
DELORES   |HANSEN     |
MANUEL    |MURRELL    |
MINNIE    |ROMERO     |
NELSON    |CHRISTENSON|
TOMMY     |COLLAZO    |

8 rows

Show/Hide Solution

For this exercise we use a subquery to obtain the date when rentals were first made. We then cast this timestamp to a date to allow ease of comparison - an alternative approach which will give you the same result would be to use the date_trunc function to truncate the rental date to the 'day'. Distinct is used to eliminate duplicates just in case any one customer rented more than once on the opening day.
select distinct c.first_name, c.last_name
from rental as r
  inner join customer as c using (customer_id)
where r.rental_date::date = 
  (select min(rental_date)::date
   from rental);

7.2 Using a subquery, return the films that don't have any actors. Now write the same query using a left join. Which solution do you think is better? Easier to read?

Table(s) to use: film, film_actor

Expected Output:
film_id|title           |
-------|----------------|
    257|DRUMLINE CYCLONE|
    323|FLIGHT LIES     |
    803|SLACKER LIAISONS|

3 rows

Show/Hide Solution

The two solutions are listed below - I would personally prefer the subquery solution because I find that easier to think about but it's up to you!
select film_id, title
from film
where film_id not in
  (select film_id
   from film_actor);
select film_id, title
from film as f
  left join film_actor as fa using (film_id)
where fa.film_id is null;

7.3 You intend to write a humorous email to congratulate some customers on their poor taste in films. To that end, write a query to return the customers who rented out the least popular film (that is, the film least rented out - if there is more than one, pick the one with the lowest film ID)

Table(s) to use: rental, inventory, customer

Expected Output:
customer_id|first_name|last_name|
-----------|----------|---------|
        257|MARSHA    |DOUGLAS  |
        142|APRIL     |BURNS    |
        564|BOB       |PFEIFFER |
         89|JULIA     |FLORES   |

4 rows

Show/Hide Solution

The subquery here is used to obtain the ID for the least popular film. This is achieved by grouping the rentals by film ID, ordering by count ascending, and picking the first. In this instance there is more than one "least popular" film so the film ID is used for tiebreaking in the order by clause.
select c.customer_id, c.first_name, c.last_name
from rental as r1
  inner join inventory as i1 using (inventory_id)
  inner join customer as c using (customer_id)
where i1.film_id =
  (select i2.film_id
   from rental as r2
     inner join inventory as i2 using (inventory_id)
   group by i2.film_id
   order by count(*) asc, i2.film_id asc
   limit 1);

7.4 Write a query to return the countries in our database that have more than 15 cities

Table(s) to use: country, city

Expected Output:
country           |
------------------|
Brazil            |
China             |
India             |
Japan             |
Mexico            |
Philippines       |
Russian Federation|
United States     |

8 rows

Show/Hide Solution

For this exercise we use a subquery to obtain, for each country, a count of the number of cities listed for that country. This can then be used in the WHERE clause to filter the countries in our final output.
select country.country
from country
where
  (select count(*)
   from city
   where city.country_id = country.country_id) > 15;

7.5 Write a query to return for each customer the store they most commonly rent from

Table(s) to use: customer, rental, inventory

Expected Output:
customer_id|first_name |last_name   |Favourite Store|
-----------|-----------|------------|---------------|
          1|MARY       |SMITH       |              1|
          2|PATRICIA   |JOHNSON     |              1|
          3|LINDA      |WILLIAMS    |              1|
          4|BARBARA    |JONES       |              2|
          5|ELIZABETH  |BROWN       |              2|
          6|JENNIFER   |DAVIS       |              1|
          7|MARIA      |MILLER      |              1|
          8|SUSAN      |WILSON      |              2|
...

599 rows

Show/Hide Solution

The correlated subquery in this case is used to obtain, for each customer, the store that the customer visits the most. How do we work that out? By tallying up all the rentals from that customer by store ID, ordering by the count (descending! If you forget to add 'desc' you'll return the least popular store for each customer!) and then taking the first result with LIMIT.
select
  c.customer_id,
  c.first_name,
  c.last_name,
  (select i.store_id
   from rental as r
     inner join inventory as i using (inventory_id)
   where r.customer_id = c.customer_id
   group by i.store_id
   order by count(*) desc
   limit 1) as "Favourite Store"
from customer as c;

7.6 In the customer table, each customer has a store ID which is the store they originally registered at. Write a query to list for each customer whether they have ever rented from a different store than that one they registered at. Return 'Y' if they have, and 'N' if they haven't.

Table(s) to use: rental, inventory, customer

Expected Output:
first_name |last_name   |HasRentedOtherStore|
-----------|------------|-------------------|
MARY       |SMITH       |Y                  |
PATRICIA   |JOHNSON     |Y                  |
LINDA      |WILLIAMS    |Y                  |
BARBARA    |JONES       |Y                  |
ELIZABETH  |BROWN       |Y                  |
JENNIFER   |DAVIS       |Y                  |
MARIA      |MILLER      |Y                  |
SUSAN      |WILSON      |Y                  |
...

599 rows

Show/Hide Solution

This was a tricky one due to the combination of using a case expression and EXISTS. If you got it right, well done! The subquery here is used to return for each customer any rentals they made from a store different from the store they registered at. If any exist, then the case expression is used to output 'Y', otherwise 'N'.
select c.first_name, c.last_name,
  case
    when exists 
      (select *
       from rental as r
         inner join inventory as i using (inventory_id)
       where r.customer_id = c.customer_id
         and i.store_id != c.store_id) then 'Y'
    else 'N'
  end as "HasRentedOtherStore"
from customer as c;

7.7 Write a query to return each customer 4 times

Table(s) to use: customer

Expected Output:
first_name |last_name   |
-----------|------------|
MARY       |SMITH       |
MARY       |SMITH       |
MARY       |SMITH       |
MARY       |SMITH       |
PATRICIA   |JOHNSON     |
PATRICIA   |JOHNSON     |
PATRICIA   |JOHNSON     |
PATRICIA   |JOHNSON     |
LINDA      |WILLIAMS    |
LINDA      |WILLIAMS    |
...

2396 rows

Show/Hide Solution

Cross joining the customer table on to any table containing only four rows will do the job - to guarantee that we have exactly four rows we specify a values list and create a new virtual table to join on to.
select c.first_name, c.last_name
from customer as c
  cross join (values (1), (2), (3), (4)) as v(n)
order by c.customer_id;

7.8 Write a query to return how many rentals the business gets on average on each day of the week. Order the results to show the days of the week with the highest average number of rentals first (use the round function to round the average so it's a nice whole number). Have a look at the to_char function to obtain the day name given a timestamp. For simplicity, don't worry about days in which there were no rentals.

Table(s) to use: rental

Expected Output:
day_name |average|
---------|-------|
Sunday   |    464|
Saturday |    462|
Friday   |    454|
Monday   |    449|
Wednesday|    446|
Thursday |    440|
Tuesday  |    224|

7 rows

Show/Hide Solution

This was a tricky one and there's a couple of different ways you could have solved it. In the solution below, the table subquery aggregates the ratings for each date from the rental table. The outer query then further aggregates by the actual day of the week for each date and performs an average for each day of the week (the to_char function is used to obtain the day name).
select
  to_char(rent_day, 'Day') as day_name,
  round(avg(num_rentals)) as average
from
  (select date_trunc('day', rental_date) as rent_day, count(*) as num_rentals
   from rental
   group by rent_day) as T
group by day_name
order by average desc;

7.9 Write a query to return for each customer the first 'PG' film that they rented (include customers who have never rented a 'PG' film as well)

Table(s) to use: customer, rental, inventory, film

Expected Output:
first_name |last_name   |title                 |rental_date        |
-----------|------------|----------------------|-------------------|
MARY       |SMITH       |TALENTED HOMICIDE     |2005-05-28 10:35:23|
PATRICIA   |JOHNSON     |BLACKOUT PRIVATE      |2005-06-17 20:54:58|
LINDA      |WILLIAMS    |HORN WORKING          |2005-08-18 14:49:55|
BARBARA    |JONES       |BEDAZZLED MARRIED     |2005-06-16 08:08:40|
ELIZABETH  |BROWN       |TOOTSIE PILOT         |2005-05-29 07:25:16|
JENNIFER   |DAVIS       |MONSTER SPARTACUS     |2005-07-11 12:39:01|
MARIA      |MILLER      |LEGEND JEDI           |2005-06-16 21:06:00|
SUSAN      |WILSON      |NORTHWEST POLISH      |2005-05-30 03:43:54|
...

599 rows

Show/Hide Solution

We make use of a left join lateral here to join on to a table subquery that will return for each customer the first film they rented that was PG. By using a left join, any customers who have not rented any PG films will be in the output with a NULL title and rental_date. There are 6 such customers.
select c.first_name, c.last_name, d.title, d.rental_date
from customer as c
  left join lateral
    (select r.customer_id, f.title, r.rental_date
     from rental as r
       inner join inventory as i using (inventory_id)
       inner join film as f using (film_id)
     where r.customer_id = c.customer_id
       and f.rating = 'PG'
     order by r.rental_date
     limit 1) as d
    on c.customer_id = d.customer_id;

7.10 Write a query to list the customers who rented out the film with title "BRIDE INTRIGUE" and then at some later date rented out the film with title "STAR OPERATION". Use a CTE to simplify your code if possible.

Table(s) to use: rental, inventory, film

Expected Output:
customer_id|
-----------|
        459|
         25|

2 rows

Show/Hide Solution

In case this exercise feels familiar, it is! You earlier did a version of this exercise when learning about joins but were instead referencing the same films by their IDs (97 and 841 respectively). The solution involves self-joins and here the use of a CTE simplifies things substantially by allowing us to define a table called rental_detail that performs all the joins and then joining that table on to itself as part of the actual query. The join condition specifies that we only keep rows where the customer IDs match, the film titles match what we're after, and the rental date of the first film is before the second film.
with rental_detail as
(
  select r.customer_id, r.rental_date, f.title
  from rental as r
    inner join inventory as i using (inventory_id)
    inner join film as f using (film_id)
)
select r1.customer_id
from rental_detail as r1
  inner join rental_detail as r2
    on r1.customer_id = r2.customer_id
    and r2.rental_date > r1.rental_date
    and r1.title = 'BRIDE INTRIGUE' and r2.title = 'STAR OPERATION';

7.11 Write a query to calculate the amount of income received each month and compare that against the previous month's income, showing the change.

Table(s) to use: payment

Expected Output:
month              |income  |prev month income|change   |
-------------------|--------|-----------------|---------|
2007-01-01 00:00:00| 4824.43|                 |         |
2007-02-01 00:00:00| 9631.88|          4824.43|  4807.45|
2007-03-01 00:00:00|23886.56|          9631.88| 14254.68|
2007-04-01 00:00:00|28559.46|         23886.56|  4672.90|
2007-05-01 00:00:00|  514.18|         28559.46|-28045.28|

5 rows

Show/Hide Solution

A tough one - give yourself a pat on the back if you got this right. The CTE in this case is used to establish a baseline table holding the amount of money received in each month. By then performing a left self-join and establishing just the right join conditions (probably the trickiest part) you're able to join up every month with the previous month to calculate the change in income.
with monthly_amounts as
(
  select
    date_trunc('month', payment_date) as month,
    sum(amount) as total
  from payment
  group by month
)
select
  curr.month,
  curr.total as "income",
  prev.total as "prev month income",
  curr.total - prev.total as "change"
from monthly_amounts as curr
  left join monthly_amounts as prev
    on curr.month = prev.month + interval '1 month'

7.12 Write a query to return the customers who rented a film in 2005 but none in 2006

Table(s) to use: rental

Expected Output:
customer_id|
-----------|
          1|
          2|
          3|
          4|
          6|
          7|
          8|
         10|
...

441 rows

Show/Hide Solution

The solution below queries the rental table for all customers who did rent a film in 2005 and then further filters the list by removing customers who then rented a film in 2006.
select distinct customer_id
from rental
where date_part('year', rental_date) = 2005
  and customer_id not in
    (select customer_id
     from rental
     where date_part('year', rental_date) = 2006);

7.13 What are the top 3 countries the customers are from. Show both the number of customers from each country and percentage (round the percentage to the nearest whole number)

Table(s) to use: customer, address, city, country

Expected Output:
country      |num_customers|percent|
-------------|-------------|-------|
India        |           60|     10|
China        |           53|      9|
United States|           36|      6|

3 rows

Show/Hide Solution

This is a fairly standard exercise involving grouping and joins which at this point you're very familiar with. The one addition though is the requirement to calculate a percentage for each country - to work out what percentage of customers come from each country. To do this, you'll need to use a subquery to obtain the count of the total number of customers to use in the calculation. In the solution I have also rounded the percentage so it's a nice whole number.
select
  country,
  count(*) as num_customers,
  round(100.0 * count(*) / (select count(*) from customer)) as percent
from customer as c
  inner join address using (address_id)
  inner join city using (city_id)
  inner join country using (country_id)
group by country
order by count(*) desc
limit 3;

7.14 Write a query to perform a running total of payments received, grouping by month (ie. for each month return the amount of money received that month and also the total amount of money received up to (and including) that month - this is a useful view to have if you wanted to produce a cumulative chart). Hint - Re-use the monthly_amounts CTE from exercise 7.11

Table(s) to use: payment

Expected Output:
month              |amount  |cumamount|
-------------------|--------|---------|
2007-01-01 00:00:00| 4824.43|  4824.43|
2007-02-01 00:00:00| 9631.88| 14456.31|
2007-03-01 00:00:00|23886.56| 38342.87|
2007-04-01 00:00:00|28559.46| 66902.33|
2007-05-01 00:00:00|  514.18| 67416.51|

5 rows

Show/Hide Solution

This exercise becomes a lot easier once we leverage the monthly_amounts CTE we worked on earlier. To obtain the cumulative amount for each month then, you can use a correlated subquery to calculate the sum of all the amounts from the same table up to and including the current month)
with monthly_amounts as
(
  select
    date_trunc('month', payment_date) as month,
    sum(amount) as amount
  from payment
  group by month
)
select ma1.month, ma1.amount,
  (select sum(ma2.amount)
   from monthly_amounts as ma2
   where ma2.month <= ma1.month) as cumamount
from monthly_amounts as ma1
order by ma1.month;

7.15 The rental table has 16,044 rows but the maximum rental ID is 16,049. This suggests that some rental IDs have been skipped over. Write a query to find the missing rental IDs. The generate_series function may come in handy

Table(s) to use: rental

Expected Output:
id   |
-----|
  321|
 2247|
 6579|
 9426|
15592|

5 rows

Show/Hide Solution

This was a particularly tricky one! generate_series is used to obtain the full list of IDs from the min to the max and then via a correlated subquery, we only keep those that don't exist in the rental table.
select s.id
from generate_series(
  (select min(rental_id) from rental),
  (select max(rental_id) from rental)) as s(id)
where not exists
  (select *
   from rental as r
   where r.rental_id = s.id);

7.16 In an earlier exercise I asked you to see if you could find a way to return the last 3 payments made in Jan, 2007 but ordered ascending. You've got the tools now to accomplish this - see if you can figure it out!

Table(s) to use: payment

Expected Output:
payment_id|amount|payment_date       |
----------|------|-------------------|
     17173|  2.99|2007-01-31 20:45:37|
     16890|  0.99|2007-01-31 21:06:00|
     16813|  0.99|2007-01-31 21:16:11|

3 rows

Show/Hide Solution

The key insight to accomplish this is to realize you can find and select the 3 rows you're after using a table subquery. Within that subquery you're able to order the payments in descending order to be able to pick the last 3, but then outside of that subquery you're once again free to specify a new ordering for display purposes!
select payment_id, amount, payment_date
from 
  (select payment_id, amount, payment_date
   from payment
   where payment_date >= '2007-01-01'
     and payment_date < '2007-02-01'
   order by payment_date desc
   limit 3) as p
order by payment_date asc;

8. Window functions

8.1 Write a query to return the 3 most recent rentals for each customer. Earlier you did this with a lateral join - this time do it with window functions

Table(s) to use: rental

Expected Output:
rental_id|customer_id|rental_date        |
---------|-----------|-------------------|
    15315|          1|2005-08-22 20:03:46|
    15298|          1|2005-08-22 19:41:37|
    14825|          1|2005-08-22 01:27:57|
    15907|          2|2005-08-23 17:39:35|
    15145|          2|2005-08-22 13:53:04|
    14743|          2|2005-08-21 22:41:56|
    15619|          3|2005-08-23 07:10:14|
    15038|          3|2005-08-22 09:37:27|
    14699|          3|2005-08-21 20:50:48|
    15635|          4|2005-08-23 07:43:00|
    15147|          4|2005-08-22 13:58:23|
    14225|          4|2005-08-21 04:53:37|
...

1797 rows

Show/Hide Solution

For each rental we can calculate a row number based on a window of rentals partitioned by customer ID and ordered by rental date descending. By then picking those rentals with a row number less than or equal to 3 we obtain the 3 most recent rentals for each customer.
select rental_id, customer_id, rental_date
from
 (select
    rental_id,
    customer_id,
    rental_date,
    row_number() over (partition by customer_id order by rental_date desc) as rn
  from rental) as t
where rn <= 3;

8.2 We want to re-do exercise 7.3, where we wrote a query to return the customers who rented out the least popular film (that is, the film least rented out). This time though we want to be able to handle if there is more than one film that is least popular. So if several films are each equally unpopular, return the customers who rented out any of those films.

Table(s) to use: rental, inventory

Expected Output:
customer_id|
-----------|
        179|
        304|
        258|
         89|
         14|
        594|
        133|
        142|
        257|
        507|
        564|
        137|

12 rows

Show/Hide Solution

The rent_counts CTE returns for each film the number of times it has been rented out and a corresponding ranking. If you run this on its own, you'll find that there are 3 equally unpopular films, each only rented out 4 times. We then return those distinct customers from the rental table who rented any film that has a ranking of 1.
with rent_counts as
(
  select
     film_id,
     count(*),
     rank() over (order by count(*))
   from rental
     inner join inventory using (inventory_id)
   group by film_id
)
select distinct customer_id
from rental as r
  inner join inventory as i using (inventory_id)
where i.film_id in
  (select film_id
   from rent_counts
   where rank = 1);

8.3 Write a query to return all the distinct film ratings without using the DISTINCT keyword

Table(s) to use: film

Expected Output:
rating|
------|
G     |
PG    |
PG-13 |
R     |
NC-17 |

5 rows

Show/Hide Solution

The key insight for this exercise is to recognize that window functions only operate over one window at a time. So by partitioning the film table table by rating and using the row_number() function, each new rating encountered will belong to a new window. And the first such film for each rating will therefore get a row number of 1. Filtering for only those films and removing NULL yields us a distinct list of ratings.
select rating
from
  (select
     rating,
     row_number() over (partition by rating) as rn
   from film) as t
where rn = 1
  and rating is not null;

8.4 Write a query to show for each rental both the rental duration and also the average rental duration from the same customer

Table(s) to use: rental

Expected Output:
customer_id|rental_id|rent_duration  |avg                   |
-----------|---------|---------------|----------------------|
          1|    10437|9 days 03:21:00|     4 days 11:18:07.5|
          1|    13068| 1 day 04:49:00|     4 days 11:18:07.5|
          1|    15315|7 days 05:48:00|     4 days 11:18:07.5|
          1|     2308|3 days 18:55:00|     4 days 11:18:07.5|
          1|     1185|8 days 01:48:00|     4 days 11:18:07.5|
          1|     7273|3 days 19:19:00|     4 days 11:18:07.5|
          1|       76|9 days 00:30:00|     4 days 11:18:07.5|
          1|     7841|2 days 03:33:00|     4 days 11:18:07.5|
...

16044 rows

Show/Hide Solution

select
  customer_id,
  rental_id,
  return_date - rental_date as rent_duration,
  avg(return_date - rental_date) over (partition by customer_id)
from rental;

8.5 Write a query to calculate a running total of payments received, grouped by month (ie. for each month show the total amount of money received that month and also the total amount of money received up to and including that month)

Table(s) to use: payment

Expected Output:
month              |amount  |running_total|
-------------------|--------|-------------|
2007-01-01 00:00:00| 4824.43|      4824.43|
2007-02-01 00:00:00| 9631.88|     14456.31|
2007-03-01 00:00:00|23886.56|     38342.87|
2007-04-01 00:00:00|28559.46|     66902.33|
2007-05-01 00:00:00|  514.18|     67416.51|

5 rows

Show/Hide Solution

We can re-use the monthly_amounts CTE we've encountered several times now to give us a baseline table to work with. Then by using the sum function over a window ordered by month ascending, we're able to take advantage of the default window frame and obtain a running total of payments received.
with monthly_amounts as
(
  select
    date_trunc('month', payment_date) as month,
    sum(amount) as amount
  from payment
  group by month
)
select
  month,
  amount,
  sum(amount) over (order by month) as running_total
from monthly_amounts;

8.6 Write a query to return the top 3 earning films in each rating category. Include ties. To calculate the earnings for a film, multiply the rental rate for the film by the number of times it was rented out

Table(s) to use: rental, inventory, film

Expected Output:
title             |rating|income|
------------------|------|------|
CAT CONEHEADS     |G     |149.70|
DOGMA FAMILY      |G     |149.70|
SATURDAY LAMBS    |G     |139.72|
BUCKET BROTHERHOOD|PG    |169.66|
GOODFELLAS SALUTE |PG    |154.69|
TITANS JERK       |PG    |144.71|
HARRY IDAHO       |PG-13 |149.70|
TRIP NEWTON       |PG-13 |139.72|
ROSES TREASURE    |PG-13 |139.72|
MASSACRE USUAL    |R     |149.70|
BOOGIE AMELIE     |R     |144.71|
CLOSER BANG       |R     |139.72|
SCALAWAG DUCK     |NC-17 |159.68|
ZORRO ARK         |NC-17 |154.69|
APACHE DIVINE     |NC-17 |154.69|
WIFE TURN         |NC-17 |154.69|

16 rows

Show/Hide Solution

We can build this query up in several steps. The first CTE calculates the amount of income received from each film by multiplying the rental rate by the number of times the film was rented out. The second CTE assigns a rank for each film based on the rating of the film and income - it also filters out those films without a rating. The final query filters the output to only return the films with a ranking less than or equal to 3.
with film_incomes as
(
  select
    f.film_id,
    f.title,
    f.rating,
    f.rental_rate * count(*) as income
  from rental as r
    inner join inventory as i using (inventory_id)
    inner join film as f using (film_id)
  group by f.film_id
),
film_rankings as
(
  select
    film_id,
    title,
    rating,
    income,
    rank() over(partition by rating order by income desc)
  from film_incomes
  where rating is not null
)
select title, rating, income
from film_rankings
where rank <= 3
order by rating, rank;

8.7 The rental table has 16,044 rows but the maximum rental ID is 16,049. This suggests that some rental IDs have been skipped over. Write a query to find the missing rental IDs (you previously did this using the generate_series function. Now do it using only window functions). Note you don't have to have your output formatted the same.

Table(s) to use: rental

Expected Output:
missing_from|missing_to|
------------|----------|
         321|       321|
        2247|      2247|
        6579|      6579|
        9426|      9426|
       15592|     15592|

5 rows

Show/Hide Solution

This exercise falls under a general class of problems known as 'finding the gap' - a quite popular question in technical interviews! For each rental, you can use the lead() function to obtain the very next rental ID (when ordered by ID). Any missing IDs therefore are when this "gap" is greater than 1, with current + 1 identifying the first missing ID and next - 1 identifying the last missing ID in the "gap".
with t as
(
  select
    rental_id as current,
    lead(rental_id) over (order by rental_id) as next
  from rental
)
select
  current + 1 as missing_from,
  next - 1 as missing_to
from t
where next - current > 1;

8.8 Calculate for each customer the longest amount of time they've gone between renting a film

Table(s) to use: rental

Expected Output:
customer_id|longest break    |
-----------|-----------------|
          1| 17 days 14:18:49|
          2| 22 days 09:36:26|
          3| 18 days 16:07:31|
          4| 38 days 16:31:09|
          5|175 days 21:39:01|
          6| 18 days 11:10:58|
          7| 16 days 15:45:13|
          8| 17 days 10:58:56|
          9|177 days 01:22:04|
         10| 17 days 08:40:33|
...

599 rows

Show/Hide Solution

The days_between CTE is used here to calculate for each rental the time difference between it and the next rental from the same customer. To obtain the longest break for each customer, the final query groups the results by customer and picks the largest such difference.
with days_between as
(
  select customer_id, rental_date,
    lead(rental_date) over (partition by customer_id order by rental_date) - rental_date as diff
  from rental
)
select customer_id, max(diff) as "longest break"
from days_between
group by customer_id
order by customer_id;

9. Working with sets

9.1 Write a query to list out all the distinct dates there was some sort of customer interaction (a rental or a payment) and order by output date

Table(s) to use: rental, payment

Expected Output:
interaction_date|
----------------|
      2005-05-24|
      2005-05-25|
      2005-05-26|
      2005-05-27|
      2005-05-28|
      2005-05-29|
      2005-05-30|
      2005-05-31|
...

81 rows

Show/Hide Solution

We can obtain the result we're after here by unioning the rental date and payment dates (cast to dates to remove the time component), from the rental and payment tables respectively. The union operator will take care of removing any duplicate dates from the output.
(
  select cast(rental_date as date) as interaction_date
  from rental
)
union
(
  select cast(payment_date as date) as interaction_date
  from payment
)
order by interaction_date;

9.2 Write a query to find the actors that are also customers (assuming same name = same person)

Table(s) to use: customer, actor

Expected Output:
first_name|last_name|
----------|---------|
JENNIFER  |DAVIS    |

1 rows

Show/Hide Solution

By performing an intersection between the first name and last name columns from both the customer and actor tables we obtain the people who appear in both tables.
(
  select first_name, last_name
  from customer
)
intersect
(
  select first_name, last_name
  from actor
);

9.3 Have the actors with IDs 49 (Anne Cronyn), 152 (Ben Harris), and 180 (Jeff Silverstone) ever appeared in any films together? Which ones?

Table(s) to use: film_actor

Expected Output:
film_id|
-------|
    729|

1 rows

Show/Hide Solution

We can obtain the list of films each actor has appeared in directly from the film_actor table. By performing an intersection between all 3 lists of films, we are left with any films common to each actor.
(
  select film_id
  from film_actor
  where actor_id = 49
)
intersect
(
  select film_id
  from film_actor
  where actor_id = 152
)
intersect
(
  select film_id
  from film_actor
  where actor_id = 180
);

9.4 The missing rental IDs problem that we've encountered several times now is the perfect place to use EXCEPT. Write a query using the generate_series function and EXCEPT to find missing rental IDs (The rental table has 16,044 rows but the maximum rental ID is 16,049 - some IDs are missing)

Table(s) to use: rental

Expected Output:
id   |
-----|
 2247|
 9426|
15592|
 6579|
  321|

5 rows

Show/Hide Solution

We first generate a sequential list of numbers ranging from the minimum rental ID (1) in the rental table to the maximum rental ID (16,049). From this, using EXCEPT, we remove the rental IDs present in the rental table. This leaves us with the missing rental IDs. Of all the different ways you've now learned to solve this exercise (first subqueries and then window functions) I think this is the most intuitive.
(
  select t.id
  from generate_series(
    (select min(rental_id) from rental),
    (select max(rental_id) from rental)) as t(id)
)
except
(
  select rental_id
  from rental
);

9.5 Write a query to list all the customers who have rented out a film on a Saturday but never on a Sunday. Order the customers by first name.

Table(s) to use: rental, customer

Expected Output:
first_name|last_name|
----------|---------|
CHRISTIAN |JUNG     |
CLIFTON   |MALCOLM  |
EVA       |RAMOS    |
FELIX     |GAFFNEY  |
LEON      |BOSTIC   |
MARION    |OCAMPO   |
ROBIN     |HAYES    |
TIFFANY   |JORDAN   |

8 rows

Show/Hide Solution

We first obtain a list of all the customers who have rented out a film on Saturday using the date_part function with 'isodow' (ISO Day Of Week) which returns a number between 1 and 7 identifying the day of the week. From this, using EXCEPT, we remove the customers who have made rentals on Sunday.
(
  select first_name, last_name
  from rental
    inner join customer using (customer_id)
  where date_part('isodow', rental_date) = 6
)
except
(
  select first_name, last_name
  from rental
    inner join customer using (customer_id)
  where date_part('isodow', rental_date) = 7
)
order by first_name;

9.6 Write a query to list out all the distinct dates there was some sort of customer interaction (a rental or a payment) and order by output date. Include only one row in the output for each type of interaction

Table(s) to use: rental, payment

Expected Output:
interaction_date|type   |
----------------|-------|
      2005-05-24|rental |
      2005-05-25|rental |
      2005-05-26|rental |
      2005-05-27|rental |
      2005-05-28|rental |
      2005-05-29|rental |
      2005-05-30|rental |
      2005-05-31|rental |
      2005-06-14|rental |
      ...
      2007-01-24|payment|
      2007-01-25|payment|
      2007-01-26|payment|
      2007-01-27|payment|
      2007-01-28|payment|
      2007-01-29|payment|
      2007-01-30|payment|
      ...


81 rows

Show/Hide Solution

Similar to the first exercise, we union the rental dates with the payment dates. This time however each query contains a static column describing the type of interaction (this pattern of adding a column to each query describing the source table is quite common when encountering set operators in real-world situations)
(
  select cast(rental_date as date) as interaction_date, 'rental' as type
  from rental
)
union
(
  select cast(payment_date as date) as interaction_date, 'payment' as type
  from payment
)
order by interaction_date;

9.7 Write a query to return the countries in which there are both customers and staff. Use a CTE to help simplify your code.

Table(s) to use: staff, customer, address, city, country

Expected Output:
country|
-------|
Canada |

1 rows

Show/Hide Solution

Not such a tricky question in this case but it does involve the same joined table in more than one case so the use of a CTE can be quite handy here instead of having to repeat the same query elements to resolve an address ID in to a country. Yes, you can use CTEs with set operators without any problems!
with address_country as
(
  select address_id, country
  from address
    inner join city using (city_id)
    inner join country using (country_id)
)
(
  select country
  from staff
    inner join address_country using (address_id)
)
intersect
(
  select country
  from customer
    inner join address_country using (address_id)
);

9.8 Imagine you had two queries - let's call them A and B. Can you figure out how you would use set operators to return the rows in either A or B, but not both.

Show/Hide Solution

To obtain the rows in either A or B but not in both, you could first perform A union B. From this, you would then minus A intersect B. In full then: (A union B) except (A intersect B). Can you think of another way too?

10. Creating tables and constraints

10.1 In this and the following exercises in this chapter, we're going to be doing some lightweight database modelling work for a fictional beach equipment rental business. Your answers may deviate a little from mine as we go, and that's fine - database design is a quite subjective topic. To kick things off, we'll keep working with our existing database but we want to create all our tables within a schema called 'beach'. Write a SQL statement to create the 'beach' schema.

Show/Hide Solution

create schema beach;

10.2 Create a table to store customers. For each customer we want to capture their first name, last name, email address, phone number, and the date the account was created. Don't worry about primary keys and constraints for now - just focus on the create statement and choosing what you think are appropriate data types for the listed attributes.

Table(s) to use: beach.customers

Show/Hide Solution

The text data type is actually ideal for the email, first name, and last name. When it comes to PostgreSQL, you should typically prefer to use text instead of varchar(n) - there's no performance difference, and unless you have a good reason to want to set a maximum length just use text (you can also set a maximum length with text via a CHECK constraint - and more too!). For the phone number, you could get more fancy and try to denormalize it further in to a few columns (for international dialing prefix, number, extension, etc.) but it's awfully complicated and there are many exceptions and edge cases. text is the more flexible approach. The create date can be of type date - we don't need a time component.
create table beach.customers (
  email text,
  first_name text,
  last_name text,
  phone text,
  create_date date
);

10.3 Create a table to store details about the equipment to be rented out. For each item, we want to store the type ('surf board', 'kayak', etc.), a general ad-hoc description of the item (color, brand, condition, etc), and replacement cost so we know what to charge customers if they lose the item.

Table(s) to use: beach.equipment

Show/Hide Solution

Both the item type and description can be text. Because the business will be only stocking certain types of items, in a future exercise we'll investigate ways to restrict the item_type just to particular values. The replacement cost is an interesting one - PostgreSQL actually does actually have a money data type that should in theory be ideal. In practice though, it has a range of problems you can read about online and so numeric is the better choice. I've opted for a precision of 7 and scale of 2, which allows storing items of up to value $99,999.99 which seems more than adequate!
create table beach.equipment (
  item_type text,
  description text,
  replacement_cost numeric(7, 2)
);

10.4 After running the business for a while, we notice that customers sometimes lose equipment. Write a SQL statement to alter the equipment table (assume it already has data in it we don't want to lose) to add a column to track whether the item is missing.

Table(s) to use: beach.equipment

Show/Hide Solution

You can use the ALTER TABLE statement to add a column to an existing table. In this case, we add the column missing to be of type boolean.
alter table beach.equipment
  add missing boolean;

10.5 Add a surrogate primary key for the customers table using the GENERATED AS IDENTITY syntax (we assume not all customers will provide an email address or phone number ruling them out as potential natural keys). Note you may drop the schema/table and re-create it from scratch.

Table(s) to use: beach.customers

Show/Hide Solution

We introduce a new surrogate key column called customer_id of type bigint (though int would likely be fine too since this is just a small beach rental business). In a challenge exercise at the end of this chapter, we'll explore what makes GENERATED ALWAYS... different from GENERATED BY DEFAULT...
create table beach.customers (
  customer_id bigint generated always as identity primary key,
  email text,
  first_name text,
  last_name text,
  phone text,
  create_date date
);

10.6 Add a surrogate primary key for the equipment table using one of the serial types. Also add in to the table definition the 'missing' column from exercise 10.4. Note you may drop the schema/table and re-create it from scratch.

Table(s) to use: beach.equipment

Show/Hide Solution

create table beach.equipment (
  equipment_id bigserial primary key,
  item_type text,
  description text,
  replacement_cost numeric(7, 2),
  missing boolean
);

10.7 Create a new table to store information about each rental with an appropriate primary key. For each rental, store the customer, the item that was rented, the rental date, and the return date.

Table(s) to use: beach.rentals

Show/Hide Solution

We'll assume for now that a rental consists of just renting one item (and revisit this in the challenge exercises) so we can define our primary key to be a composite natural key comprising the customer ID, equipment ID, and rental date. The combination of all 3 should be unique - we include the rental date in the key to allow the very plausible scenario of the same customer renting the same item on some other day.
create table beach.rentals (
  customer_id bigint,
  equipment_id bigint,
  rental_date date,
  return_date date,
  primary key (customer_id, equipment_id, rental_date)
);

10.8 Add appropriate foreign keys to the rentals table. Setup the foreign keys such that if the referenced customer or equipment is deleted, the related entries in the rentals table will also be deleted. Note you may drop the schema/table and re-create it from scratch.

Table(s) to use: beach.rentals

Show/Hide Solution

Both the customer ID and equipment ID can be setup as foreign keys with ON DELETE CASCADE specified to ensure deletion of any referenced customers or equipment cascade delete records in the rentals table.
create table beach.rentals (
  customer_id bigint references beach.customers (customer_id) on delete cascade,
  equipment_id bigint references beach.equipment (equipment_id) on delete cascade,
  rental_date date,
  return_date date,
  primary key (customer_id, equipment_id, rental_date)
);

10.9 Add appropriate check, unique, not null, and default constraints to the customers table to capture the following requirements: a) A customer must provide both a first name and last name b) A customer must provide at least one contact detail - a phone number or email address c) The create date should be the date the new customer record is inserted in the table d) No two customers should have the same email address or phone number

Table(s) to use: beach.customers

Show/Hide Solution

You shouldn't have too much problem picking constraints for the above requirements. Two interesting ones worth focusing on are the default create date - in this case, we use the function current_date (oddly enough, a function that you can call without parentheses) to return the current date. Also the check constraint, to make sure that either the email or phone number is not null.
create table beach.customers (
  customer_id bigint generated always as identity primary key,
  email text unique,
  first_name text not null,
  last_name text not null,
  phone text unique,
  create_date date not null default current_date,
  check (email is not null or phone is not null)
);

10.10 Add appropriate check, unique, not null, and default constraints to the equipment table to capture the following requirements: a) A newly added item should not be missing! b) Each item must have a type c) The replacement cost can be NULL. But if provided, it must be a positive number

Table(s) to use: beach.equipment

Show/Hide Solution

You might be wondering about the check constraint for replacement_cost and why you didn't have to explicitly permit null. It's always important to keep in mind the '3 valued' logic employed by SQL where an expression can evaluate to true, false, or unknown. On inserting a replacement cost of NULL, the comparison "Is NULL >= 0" evaluates to unknown. Check constraints reject values that evaluate to false, but unknown is fair game. Which in this case, is exactly what we want.
create table beach.equipment (
  equipment_id bigserial primary key,
  item_type text not null,
  description text,
  replacement_cost numeric(7, 2) check (replacement_cost >= 0),
  missing boolean not null default false
);

10.11 Search the official PostgreSQL documentation for information about GENERATED { ALWAYS | BY DEFAULT } and explain in your own words the difference between the two options.

Show/Hide Solution

With BY DEFAULT, PostgreSQL will generate a unique value for the column on insert, but will also allow you to insert your own value if you want to manually take control. With ALWAYS, if you attempt to provide your own value for the column as part of the insert statement, you'll get an error.

10.12 Imagine you wanted to restrict the equipment types to just a couple of options like 'Surfboard', 'Single Kayak', 'Double Kayak', etc. To do this, you actually have more options at your disposal than you might think. You could add a check constraint to the column. You could create an ENUM. You could create a whole new data type called a DOMAIN. Or you could use a new table in your database to store the available types. Have a read of this article and justify which approach you would take in this case.

Show/Hide Solution

The approach I would take is to store the types in their own table with the type defined as the primary key. The main reason for preferring this approach is because it would be reasonable to expect new types of equipment to be added over time. When this happens, rather than having to alter constraints, drop the enum type and recreate, etc. and all the other complicated approaches, managing the types in their own table would be very easy with simple INSERT statements. We could also easily produce a list of all the different types available for rent, etc.
create table equipment_types (
  type text primary key
);

10.13 A customer comes in and asks whether there are any Single Kayak's available for rent. Write a query to return a count of how many Single Kayak's are available for rent.

Table(s) to use: beach.equipment, beach.rentals

Show/Hide Solution

In this case, the query is written to perform a count of items with the type 'Single Kayak', that are not missing, and also are not actively being rented (where an active rental is defined as a rental with a NULL return date).
select count(*)
from beach.equipment
where item_type = 'Single Kayak'
  and missing = false
  and equipment_id not in
    (select equipment_id
     from beach.rentals
     where return_date is null);

10.14 Right now a rental consists of a customer renting a single item. If it were more common though for a customer to rent multiple items at a time and you wanted to capture these all under a single 'rental', how would you modify the existing tables (or create new tables) to model this.

Show/Hide Solution

One approach could be to split out the equipment being rented in to its own table. So as shown below, the rental_details table would capture details about the items, and could contain multiple entries for each rental ID.
create table beach.rentals (
  rental_id bigserial primary key,
  customer_id bigint references beach.customers (customer_id),
  rental_date date,
  return_date date
);

create table beach.rental_details (
  rental_id bigint references beach.rentals (rental_id),
  equipment_id bigint references beach.equipment (equipment_id),
  primary key (rental_id, equipment_id)
);

11. Inserting, updating, and deleting data

11.1 In the last video, we imported the data for the users and notes tables from CSV files. Now also import the note tags from CSV.

Table(s) to use: playground.note_tags

Show/Hide Solution

copy playground.note_tags
from '...note_tags.csv'
with (format csv, header true);

11.2 Export in CSV format (with headings) the number of notes created by each user. Order the users in descending order based on how many notes they've created.

Table(s) to use: playground.users, playground.notes

Expected Output:
name,note_count
David Hilbert,4
Joseph Fourier,3
Bernhard Riemann,2
Leonhard Euler,1
Blaise Pascal,1

Show/Hide Solution

copy
(
  select
    u.first_name || ' ' || u.last_name as name,
    count(*) as note_count
  from playground.notes as n
    inner join playground.users as u
      on n.user_email = u.email
  group by u.email, u.first_name, u.last_name
  order by note_count desc
)
to '...note_counts.csv'
with (format csv, header true);

11.3 Output values (any values of your choosing) of the following types to a plain text file: a) text b) int c) numeric d) real e) boolean f) date g) timestamptz h) interval

Expected Output:
test	33	33.3	33.2999992	t	2019-05-12	2019-05-12 15:25:11.644876+10	1 day

Show/Hide Solution

copy
(
  select
    'test'::text,
    33::int,
    33.3::numeric,
    33.3::real,
    true::boolean,
    current_date::date,
    current_timestamp::timestamptz,
    '1 day'::interval
)
to '...output_types.txt';

11.4 Write an insert statement to insert a new customer in to the customer table with any details of your choosing. Use the returning clause to return the inserted row. For this and all upcoming exercises, remember to do this inside a transaction block and rollback the change afterwards!

Table(s) to use: customer

Expected Output:
customer_id|store_id|first_name|last_name|email|address_id|activebool|create_date|last_update        |active|
-----------|--------|----------|---------|-----|----------|----------|-----------|-------------------|------|
        600|       1|John      |Henry    |     |         1|true      | 2019-05-14|2019-05-14 15:21:57|     1|

Show/Hide Solution

insert into customer(store_id, first_name, last_name, address_id, active)
values
  (1, 'John', 'Henry', 1, 1)
returning *;

11.5 Create a new table called rental_stats with two columns, date and num_rentals, to hold the data for the number of rentals made each day. And write an insert statement to populate the table with correct data as calculated from the rental table

Table(s) to use: rental, rental_stats

Show/Hide Solution

We achieve this with two statements - one to create the rental_stats table (don't worry about PKs) and an insert statement that queries the rental table, calculating the number of rentals made on each day. As an alternative, it is also possible to create a table and populate it with results from a query in a single statement using CREATE TABLE AS - I've included below a solution using this approach as well. With CREATE TABLE AS, the created table column names and types come directly from the query.
create table rental_stats (
  date date,
  num_rentals int
);

insert into rental_stats
  select
    rental_date::date as rental_day,
    count(*)
  from rental
  group by rental_day
  order by rental_day;
create table rental_stats as
  select
    rental_date::date as rental_day,
    count(*)
  from rental
  group by rental_day
  order by rental_day;

11.6 All customers should have an email address of the form [first_name].[last_name]@sakilacustomer.org (all in lower case). Write an update statement so that all customers have an email address in this format.

Table(s) to use: customer

Show/Hide Solution

In this case since we can visually observe that none of the customer email addresses are in the correct format, writing a blanket statement to update all the rows is fine. In general though, you are better off limiting your update statements to only update rows that you have to for performance reasons.
update customer
set email = lower(first_name || '.' || last_name || '@sakilacustomer.org');

11.7 Write an update statement to update the rental rate of the 20 most rented films by 10%

Table(s) to use: rental, inventory, film

Show/Hide Solution

For this update statement, we update the rental_rate for those films that exist in the results of a subquery. The subquery returns the 20 most rented films.
update film
set rental_rate = rental_rate * 1.1
where film_id in
  (
    select
      i.film_id
    from rental as r
      inner join inventory as i using (inventory_id)
    group by i.film_id
    order by count(*) desc
    limit 20
  );

11.8 Write a script to add a new column to the films table to hold the length of each film in hours (have a look at some of the examples for the ALTER TABLE command) and then populate this new column with the correct values

Table(s) to use: film

Show/Hide Solution

The script first creates the new column length_hrs of type numeric(2, 1) - allowing the storing of lengths like 1.2 hrs, 3.0 hrs, etc. Next the update statement populates the column with correct values, taking care to avoid integer division by dividing by 60.0.
alter table film
  add column length_hrs numeric(2, 1);

update film
set length_hrs = length / 60.0
returning *;

11.9 Delete all the payments where the payment amount was zero, returning the deleted rows

Table(s) to use: payment

Expected Output:
payment_id|customer_id|staff_id|rental_id|amount|payment_date       |
----------|-----------|--------|---------|------|-------------------|
     31918|        267|       2|    13713|  0.00|2007-05-14 13:44:29|
     31920|        269|       2|    12610|  0.00|2007-05-14 13:44:29|
     31925|        284|       2|    12959|  0.00|2007-05-14 13:44:29|
     31942|        354|       1|    11782|  0.00|2007-05-14 13:44:29|
     31946|        361|       1|    14769|  0.00|2007-05-14 13:44:29|
     31966|        448|       1|    13577|  0.00|2007-05-14 13:44:29|
     31970|        457|       2|    14516|  0.00|2007-05-14 13:44:29|
     31983|        516|       1|    12915|  0.00|2007-05-14 13:44:29|
...

24 rows

Show/Hide Solution

delete from payment
where amount = 0
returning *;

11.10 Delete all the unused languages from the language table

Table(s) to use: language

Show/Hide Solution

This solution makes use of a plain uncorrelated subquery to return all the languages that are in use in the film table and deletes any languages from the language table not in this list.
delete from language
where language_id not in
  (select distinct language_id
   from film);

11.11 Write a single update statement to update the activebool column for customers to be true if they made a rental in 2006 or later, and false otherwise.

Table(s) to use: customer, rental

Show/Hide Solution

In this exercise, the update statement must update the value of the activebool column conditionally. If the customer has made rentals in 2006 or later, the value must be set to true. If the customer hasn't made any rentals in 2006 or later, the value must be set to false. Whenever you come across conditional situations like this, case expressions often come in handy. Here we use a case expression that depends on the result of a correlated subquery that evaluates whether any rentals were made for each customer in 2006 or later.
update customer
set activebool =
  case
    when exists
      (select *
       from rental
       where rental.customer_id = customer.customer_id
         and rental_date >= '2006-01-01')
       then true
    else false
  end;

11.12 Create a new table, with appropriate primary keys and foreign keys, to hold the amount of inventory of each film in each store (store_id, film_id, stock_count). In this table we want to store the stock level for every film in every store - including films that aren't in stock. Write an "upsert" statement to populate the table with the correct values. By "upsert", I mean insert a SQL statement that will either insert a new row in the table (ie. a new film, store, stock count) or update the stock count if the film/store attempting to be inserted already exists in the table). Research PostgreSQL's INSERT ON CONFLICT and look at the examples for some guidance on how to do this.

Table(s) to use: film, store, inventory, inventory_stats (new)

Show/Hide Solution

This was a quite difficult exercise so give yourself a big pat on the back if you were able to work it out. Starting with the table creation, we create a new table called inventory_stats with a composite primary key consisting of the store_id and film_id. For the INSERT statement the underlying query providing the data to be inserted (or updated) is a cross join between the film and store tables to produce all unique combinations of films and stores, and then left joined with the inventory table to tally up stock levels, including films that aren't in stock. Finally, the ON CONFLICT clause ensures that if there is a primary key violation on insert, the stock count is still updated. Upsert statements can come in handy in many situations!
create table inventory_stats
(
  store_id smallint references store (store_id),
  film_id smallint references film (film_id),
  stock_count int not null,
  primary key (store_id, film_id)
);

insert into inventory_stats(store_id, film_id, stock_count)
  select s.store_id, f.film_id, count(i.inventory_id)
  from film as f
    cross join store as s
    left join inventory as i
      on f.film_id = i.film_id
      and s.store_id = i.store_id
  group by f.film_id, s.store_id
on conflict (store_id, film_id)
do update set
  stock_count = excluded.stock_count;

11.13 Write a single statement to delete the first rental made by each customer and to avoid any foreign key violations you'll also have to delete any associated payments in that same statement. You might need to do some research online to figure this one out. As a hint, you can use Common Table Expressions (CTEs) with delete statements and delete statements themselves can return results with the RETURNING clause!

Table(s) to use: rental, payment

Show/Hide Solution

Another very tricky exercise. With this exercise, a DISTINCT ON subquery is used to obtain the first rental_id for each customer to be deleted, and the deletion from the rental table takes place within a CTE and returns the IDs that were deleted. Using the CTE results, associated payments are also deleted. Note that since both deletes occur within a single statement we avoid violating the foreign key (even though the rentals are deleted "first").
with deleted_rentals as
(
  delete from rental
  where rental_id in
    (select distinct on (customer_id) rental_id
     from rental
     order by customer_id, rental_date)
  returning rental_id
)
delete from payment
where rental_id in
  (select rental_id
   from deleted_rentals);

11.14 In the films table the rating column is of type mpaa_rating, which is an ENUM. You've read online about the downsides of ENUMs and now want to convert your table design to instead store the different mpaa rating types in a reference table with the type as the primary key. Write a script to create the new table, populate it with data, convert the film table, and then drop the mpaa_rating type so it won't be used ever again. You're going to need to Google a few ideas and look up some documentation to get through this one - good luck!

Table(s) to use: film, mpaa_ratings (new)

Show/Hide Solution

After creating the mpaa_ratings table, the different values the ENUM can take on are inserted in to the table. The query to obtain the different enum values for insertion in to the new table is quite strange, making use of several new functions you haven't seen before...it's just one of those things you end up Googling when infrequently needed. In the next statement we then make a number of changes to the film table - converting the rating column to type text, dropping and re-adding the default constraint (we have to do this because the column is specified to get a default value of type mpaa_rating), and then adding the foreign key constraint to reference the new table. Finally, we're able to drop the mpaa_rating type.
create table mpaa_ratings
(
  rating text primary key
);

insert into mpaa_ratings
  select unnest(enum_range(null::mpaa_rating));

alter table film
  alter column rating drop default,
  alter column rating type text,
  alter column rating set default 'G',
  add foreign key (rating) references mpaa_ratings(rating);

drop type mpaa_rating;

12. Views and functions

12.1 We often need to get basic film information for a rental and so regularly find ourselves writing queries to join from the rental table on to the inventory and film tables. Write a view called vw_rental_film to make this more convenient, returning for each rental ID the film's title, length, and rating. Then write a query to return all the rows from this view to check it's working as expected.

Table(s) to use: rental, inventory, film

Expected Output:
rental_id|title                      |length|rating|
---------|---------------------------|------|------|
        1|BLANKET BEVERLY            |   148|G     |
        2|FREAKY POCUS               |   126|R     |
        3|GRADUATE LORD              |   156|G     |
        4|LOVE SUICIDES              |   181|R     |
        5|IDOLS SNATCHERS            |    84|NC-17 |
        6|MYSTIC TRUMAN              |    92|NC-17 |
        7|SWARM GOLD                 |   123|PG-13 |
        8|LAWLESS VISION             |   181|G     |
...

16044 rows

Show/Hide Solution

For this exercise we write a simple query not so different from queries we've written many times in this course now, but embed it within the create view statement. You can now use this view to simplify any queries you write in the future!
create view vw_rental_film as
  select
    r.rental_id,
    f.title,
    f.length,
    f.rating
  from rental as r
    inner join inventory as i using (inventory_id)
    inner join film as f using (film_id)
  order by r.rental_id;

12.2 Use the vw_rental_film view you just created to return, for each customer, the number of 'R' films they've rented out. Include customers who haven't rented any R films also. Note - this is trickier than it first appears...be very careful and double-check your results!

Table(s) to use: customer, rental

Expected Output:You should have two customers in your output that did not rent any 'R' films - The customers with IDs 238 and 437.
customer_id|count|
-----------|-----|
          1|    5|
          2|    4|
          3|   12|
          4|    4|
          5|    9|
          6|    6|
          7|    7|
          8|    3|
...
        238|    0|
...
        437|    0|
...

599 rows

Show/Hide Solution

For this exercise, the requirement to return 0 counts for customers who haven't rented any 'R' films complicates matters quite a bit. Quite a bit of care must be taken to ensure all customers appear in the output. For my solution, I first inner join the rental table with the view and use parentheses to combine this in to a single unit. The left outer join then takes place between the customer table and this unit. You could also achieve the same effect by first inner joining the rental table with the view, and then performing a right join on to the customer table. Due to the left join, care must also be taken when the count() function. If you had a lot of trouble with this exercise, make sure to go back and watch the "Outer joins" and "Advanced join topics" videos.
select
  c.customer_id,
  count(r.rental_id)
from customer as c
  left join (rental as r
               inner join vw_rental_film as rf
                 on r.rental_id = rf.rental_id
                 and rf.rating = 'R')
    using (customer_id)
group by c.customer_id
order by c.customer_id;

12.3 Create a view called vw_monthly_totals that returns the amount of income received each month (you've done this a couple of times in this course now - time to finally save this in a view to avoid the repetition!)

Table(s) to use: payment

Expected Output:
month              |total   |
-------------------|--------|
2007-01-01 00:00:00| 4824.43|
2007-02-01 00:00:00| 9631.88|
2007-03-01 00:00:00|23886.56|
2007-04-01 00:00:00|28559.46|
2007-05-01 00:00:00|  514.18|

5 rows

Show/Hide Solution

We simply wrap the query we've written a few times now inside a view definition.
create view vw_monthly_totals as
  select
    date_trunc('month', payment_date) as month,
    sum(amount) as total
  from payment
  group by month
  order by month;

12.4 Using the new vw_monthly_totals view and window functions (remember how those work? If not, here's your chance to refresh!), write a query that returns the amount of income received each month and compares it against the previous month's income, showing the change.

Table(s) to use: payment

Expected Output:
month              |income  |prev month income|change   |
-------------------|--------|-----------------|---------|
2007-01-01 00:00:00| 4824.43|                 |         |
2007-02-01 00:00:00| 9631.88|          4824.43|  4807.45|
2007-03-01 00:00:00|23886.56|          9631.88| 14254.68|
2007-04-01 00:00:00|28559.46|         23886.56|  4672.90|
2007-05-01 00:00:00|  514.18|         28559.46|-28045.28|

5 rows

Show/Hide Solution

Using the LAG window function, you can access the previous month's total from the view and then use that to calculate the monthly income change.
select
  month,
  total as income,
  lag(total) over (order by month) as "prev month income",
  total - lag(total) over (order by month) as "change"
from vw_monthly_totals;

12.5 Continuing on from Exercise 1, you now have a view called vw_rental_film. You create a new materialized view called mvw_rental_film defined as below. Imagine a period of time has now passed and the materialized view's cache is out of date. Write a query which will output the difference between the original view and the materialized view (essentially this boils down to writing a query to show the difference between two sets of results). Within a test transaction block you can roll back, make some insertions and deletions to test your query works as expected.

create materialized view mvw_rental_film as
  select *
  from vw_rental_film;

Show/Hide Solution

To find out the differences between two sets of results, you need to identify two things. First, you need to find the elements that are in the first set but not in the second. And you also need to find the elements that are in the second set, but not in the first. Combining those two lists of differences, you then have the complete list. You can achieve all these operations using the set operators EXCEPT and UNION ALL as below.
(
  select * from vw_rental_film
  except
  select * from mvw_rental_film
)
union all
(
  select * from mvw_rental_film
  except
  select * from vw_rental_film
);

12.6 Write a function using SQL that takes as an argument a customer_id and returns the number of unreturned rentals for that customer

Table(s) to use: rental

Show/Hide Solution

An unreturned rental is identified by having a return date of NULL so you must filter for that with your query. One thing which may catch you out when writing this function is to make sure the right types are being used. The count function in particular actually returns a bigint - so you can make the return type of your function bigint or cast to an int as I have (my reasoning for going with int is because the count will never be larger than int can store since the primary key of the rental table is also of type int)
create or replace function unreturned_rentals
(
  p_customer_id int
)
returns int
language sql
as $$
  select cast(count(*) as int)
  from rental
  where customer_id = p_customer_id
    and return_date is null;
$$;

12.7 Using the function you just created, write a query that outputs the number of unreturned rentals for each customer

Table(s) to use: customer

Expected Output:
customer_id|unreturned_rentals|
-----------|------------------|
          1|                 0|
          2|                 0|
          3|                 0|
          4|                 0|
          5|                 1|
          6|                 0|
          7|                 0|
          8|                 0|
          9|                 1|
         10|                 0|
...

599 rows

Show/Hide Solution

Notice how from a behaviour perspective this is similar to a correlated subquery! We effectively have some SQL code that is run for each customer in the customer table and returns a single value.
select
  customer_id,
  unreturned_rentals(customer_id)
from customer;

12.8 Write a function using SQL to generate a random int between HIGH and LOW inclusive (two input int arguments)

Show/Hide Solution

There are a couple of different ways you can do this. Regardless of the approach you take, make sure to test your function by running it a number of times and observing the output. If you're uncomfortable with the math used in the solution grab a pen and paper and walk through a couple of simple examples on paper alone (eg. try rand(1, 4)). Note that the built-in random() function returns values between 0 (inclusive) and 1.0 (exclusive) which if you're not careful to account for, can cause your rand function to be off-by-one.
create or replace function rand
(
  p_low int,
  p_high int
)
returns int
language sql
as $$
  select floor(random() * (p_high - p_low + 1))::int + p_low;
$$;

12.9 Write a function using PL/pgSQL that decides whether a customer should get a discount when renting a film. Discounts are provided if a customer has no unreturned rentals (feel free to use the function you wrote in Exercise 12.6) and has also never rented the film before. The function should take a customer_id and film_id as arguments and return true if the customer should receive a discount, or false otherwise.

Show/Hide Solution

This function has a couple of key parts. First, the previously written unreturned_rentals function is used and the result stored in num_outstanding. Secondly, a inline SQL query is used to count the number of times the customer has rented the film before and the result stored in num_rented. The final return expression returns the value of the boolean expression, which is true if the customer has no unreturned rentals and also has never rented the film before (both variables are 0), and false otherwise.
create or replace function apply_discount
(
  p_customer_id int,
  p_film_id int
)
returns boolean
language plpgsql
as $$
  declare
    num_outstanding int;
    num_rented int;
  begin
    num_outstanding := unreturned_rentals(p_customer_id);

    select count(*) into num_rented
    from rental as r
      inner join inventory as i using (inventory_id)
      inner join film as f using (film_id)
    where customer_id = p_customer_id
      and film_id = p_film_id;

    return (num_outstanding = 0 and num_rented = 0);
  end
$$;

12.10 For this exercise we're going to be writing a function using PL/pgSQL to implement the classic FizzBuzz problem. Write a function that returns in a table each number from 1 to n (where n is an argument you pass in to the function), and in the second column if the number is a multiple of 3, print "Fizz". If the number is a multiple of 5, print "Buzz". But if the number is a multiple of both 3 and 5, print "FizzBuzz". Otherwise, just print the number. You may want to use RETURN NEXT to build up your table one row at a time - read more about this in the official documentation.

Expected Output:Calling the function with n = 100 should return
n  |fb      |
---|--------|
  1|1       |
  2|2       |
  3|Fizz    |
  4|4       |
  5|Buzz    |
  6|Fizz    |
  7|7       |
  8|8       |
  9|Fizz    |
 10|Buzz    |
 11|11      |
 12|Fizz    |
 13|13      |
 14|14      |
 15|FizzBuzz|
 16|16      |
...

100 rows

Show/Hide Solution

There's a variety of different ways you can go about solving this problem. For my solution, I loop from 1 to n using a for loop and build up the table of results one row at a time using RETURN NEXT (when used without any expression, the current values of the output variables - n and fb - are saved for return in a new row). The modulos operator is used to check for divisibility by 3 and 5.
create or replace function fizzbuzz (p_n int)
returns table (n int, fb text)
language plpgsql
as $$
  begin
    for cntr in 1..p_n loop
      n := cntr;

      if (cntr % 3 = 0 and cntr % 5 = 0) then
        fb := 'FizzBuzz';
      elsif cntr % 3 = 0 then
        fb := 'Fizz';
      elsif cntr % 5 = 0 then
        fb := 'Buzz';
      else
        fb := cntr;
      end if;

      return next;
    end loop;
  end
$$;