4  Cheats

This is the cheat sheet section of this book. It only shows example code for a copy and paste apporach for each chapter.

4.1 Introduction

4.1.1 Select

SELECT statement is used to fetch data from a database.

SELECT * FROM mtcars;
Displaying records 1 - 10
mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

4.1.2 Where

WHERE clause is used to extract only those records that fulfill a condition.

SELECT * FROM mtcars WHERE am = 0 LIMIT 5;
5 records
mpg cyl disp hp drat wt qsec vs am gear carb
21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

4.1.3 Count

COUNT takes the name of a column as argument and counts the number of rows when the column is not NULL.

SELECT COUNT(*) FROM mtcars;
1 records
COUNT(*)
32

4.1.4 DISTINCT

Get unique values in specified columns.

SELECT DISTINCT Species FROM iris
3 records
Species
setosa
versicolor
virginica

4.1.5 Insert Values

Insert new rows in the table.

library(tidyverse)

df <- tribble(
  ~x, ~y,  ~z,
  "a", 2,  3.6,
  "b", 1,  8.5
)
df
# A tibble: 2 × 3
  x         y     z
  <chr> <dbl> <dbl>
1 a         2   3.6
2 b         1   8.5

Now, we can insert new values into df.

INSERT INTO df (x, y, z) VALUES('c', 3, 1);
SELECT * FROM df;

4.1.6 Updates

Update the rows in the table.

UPDATE df SET z = 77 WHERE x ='b';

4.1.7 Delete

Remove rows from the table which are specified in the WHERE condition.

DELETE from df WHERE x = 'b';

4.2 Data management

4.2.1 Create table

Each column in the table is specified with its name, data type and an optional keyword which could be PRIMARY KEY, NOT NULL, etc.,

CREATE TABLE PETSALE (
    ID INTEGER NOT NULL,
    PET CHAR(20),
    SALEPRICE DECIMAL(6,2),
    PROFIT DECIMAL(6,2),
    SALEDATE DATE
    );

INSERT INTO fills the table:

INSERT INTO PETSALE VALUES
    (1,'Cat',450.09,100.47,'2018-05-29'),
    (2,'Dog',666.66,150.76,'2018-06-01'),
    (3,'Parrot',50.00,8.9,'2018-06-04'),
    (4,'Hamster',60.60,12,'2018-06-11'),
    (5,'Goldfish',48.48,3.5,'2018-06-14');

4.2.2 Alter table

ADD COLUMN:

ALTER TABLE PETSALE
ADD COLUMN QUANTITY INTEGER;

Fill in values:

UPDATE PETSALE SET QUANTITY = 9 WHERE ID = 1;

Check whether it worked

SELECT * FROM PETSALE;
5 records
ID PET SALEPRICE PROFIT SALEDATE QUANTITY
1 Cat 450.09 100.47 2018-05-29 9
2 Dog 666.66 150.76 2018-06-01 NA
3 Parrot 50.00 8.90 2018-06-04 NA
4 Hamster 60.60 12.00 2018-06-11 NA
5 Goldfish 48.48 3.50 2018-06-14 NA

DROP COLUMN:

ALTER TABLE PETSALE
DROP COLUMN PROFIT;
SELECT * FROM PETSALE;
5 records
ID PET SALEPRICE SALEDATE QUANTITY
1 Cat 450.09 2018-05-29 9
2 Dog 666.66 2018-06-01 NA
3 Parrot 50.00 2018-06-04 NA
4 Hamster 60.60 2018-06-11 NA
5 Goldfish 48.48 2018-06-14 NA

ALTER COLUMN:

ALTER TABLE PETSALE
ALTER COLUMN PET SET DATA TYPE VARCHAR(20);
SELECT * FROM PETSALE;
5 records
ID PET SALEPRICE SALEDATE QUANTITY
1 Cat 450.09 2018-05-29 9
2 Dog 666.66 2018-06-01 NA
3 Parrot 50.00 2018-06-04 NA
4 Hamster 60.60 2018-06-11 NA
5 Goldfish 48.48 2018-06-14 NA

RENAME COLUMN:

ALTER TABLE PETSALE
RENAME COLUMN PET TO ANIMAL;
SELECT * FROM PETSALE;
5 records
ID ANIMAL SALEPRICE SALEDATE QUANTITY
1 Cat 450.09 2018-05-29 9
2 Dog 666.66 2018-06-01 NA
3 Parrot 50.00 2018-06-04 NA
4 Hamster 60.60 2018-06-11 NA
5 Goldfish 48.48 2018-06-14 NA

4.2.3 Truncate

The TRUNCATE statement will remove all(!) rows from an existing table, just like the one we created in the beginning, however, it does not delete the table itself.

TRUNCATE TABLE PET IMMEDIATE;

Caution: DROP TABLE tablename; drops the entire table!

DROP TABLE PETSALE;

4.3 Calculations

For this subsection I created a simple table that contains attributes about countries:

SELECT * FROM df ;
4 records
country y z id
Germany 2 3.6 1
Austria 1 8.5 2
Brazil 4 2.5 3
Brazil 3 3.5 3

4.3.1 Like

The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

There are two wildcards often used in conjunction with the LIKE operator which are percent sign(%) and underscore sign (_).

SELECT * FROM df WHERE country LIKE 'A%';
1 records
country y z id
Austria 1 8.5 2

4.3.2 Between

Select data within a range:

select * from mtcars
where (hp BETWEEN 100 and 200) and AM = 1 ;
5 records
mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

4.3.3 In

Use the IN operator to select observations that match the provided list of the IN operator:

SELECT * FROM df WHERE country IN ('Brazil');
2 records
country y z id
Brazil 4 2.5 3
Brazil 3 3.5 3

4.3.4 Order by

ORDER BY is used to sort the result-set in ascending or descending order. The default is ascending.

SELECT * FROM df ORDER BY country;
4 records
country y z id
Austria 1 8.5 2
Brazil 4 2.5 3
Brazil 3 3.5 3
Germany 2 3.6 1

4.3.5 Distinct

Exclude duplicates with DISTINCT:

SELECT distinct(country) FROM df ;
3 records
country
Germany
Austria
Brazil

4.3.6 Group by

GROUP BY is used in collaboration with SELECT to arrange identical data into groups.

SELECT country, count (country) AS Var_Name from df group by country;
3 records
country Var_Name
Austria 1
Brazil 2
Germany 1

4.3.7 Average

SELECT country, AVG(z) as Mean from df group by country;
3 records
country Mean
Austria 8.5
Brazil 3.0
Germany 3.6

4.3.8 Having

The WHEREclause is for entire result set; while HAVING works only for the GROUPED BY clause.

SELECT country, count (country) AS Count from df group by country having count (country) > 1;
1 records
country Count
Brazil 2

4.3.9 Further functions

Summarize groups:

select sum(mpg) as sum_mpg from mtcars where hp > 100
1 records
sum_mpg
401.4

Round to the nearest integer:

select round(drat, 1) as round_drat from mtcars
Displaying records 1 - 10
round_drat
3.9
3.9
3.9
3.1
3.1
2.8
3.2
3.7
3.9
3.9

The length of a string:

select length(country) from df
4 records
length(country)
7
7
6
6

Depending the SQL database you use, in db2 you can use the upper (UCASE) and lower case (LCASE) function for strings.

select upper(country) from df
4 records
upper(country)
GERMANY
AUSTRIA
BRAZIL
BRAZIL

In case of Oracle the functions are called lower and upper.

4.3.10 Date and Time Built-in Functions

Talking about SQL databases, there are three different possibilities to work with date and time DB2.
- Date: YYYYMMDD (Year/Month/Day) - Time: HHMMSS (Hours/Min/Sec) - Timestamp: YYYYMMDDHHMMSSZZZZZZ (Date/Time/Microseconds)

Depending on what you are up to do, there are functions to extract the day, month, day of month, day of week, day of year, week, hour, minute, and second. You can also extract the current_date and the current_time. Unfortunately, this does not work in Oracle the same way as in DB2, but to give you an example how to extract the day:

select day(date) from df where country = 'Germany'

4.3.11 Working with Multiple Tables

There are several ways to access multiple tables in the same query. Namely, using sub-queries, implicit join, and join operators, such as INNER JOIN and OUTER JOIN. For instance:

select * from df2;
2 records
country valid id
Germany 1 1
Austria 0 2

Let’s say we want only observations from df that are listed in df2. In such a situation we can use a sub-queries:

select * from df
  where country in
  (select country from df2)
2 records
country y z id
Germany 2 3.6 1
Austria 1 8.5 2

Of course, you could add also information of the second table and include only countries with a certain value:

select * from df
  where country in
  (select country from df2 where valid = 1)
1 records
country y z id
Germany 2 3.6 1

Implicit joins implies that we can access multiple tables by specifying them in the FROM clause of the query. This leads to a CROSS JOIN (also known as Cartesian Join).

select * from df, df2
8 records
country y z id country valid id
Germany 2 3.6 1 Germany 1 1
Germany 2 3.6 1 Austria 0 2
Austria 1 8.5 2 Germany 1 1
Austria 1 8.5 2 Austria 0 2
Brazil 4 2.5 3 Germany 1 1
Brazil 4 2.5 3 Austria 0 2
Brazil 3 3.5 3 Germany 1 1
Brazil 3 3.5 3 Austria 0 2

In DBL2 we can use the where clause to match data (see code); in Oracle there are other matching operaters

select * from df, df2 where df.id = df.id;

In case of long names, we can use shorter aliases for table names (or use column names with aliases in the SELECT clause):

select * from df A, df2 B where A.id = B.id;