4 Cheats

This is the cheat sheet section of this book. It only shows example code for a copy and paste apporach for each chapter.

4.1 Introduction

4.1.1 Select

SELECT statement is used to fetch data from a database.

SELECT * FROM mtcars;

Displaying records 1 - 10
mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
21.0	6	160.0	110	3.90	2.620	16.46	0	1	4	4
21.0	6	160.0	110	3.90	2.875	17.02	0	1	4	4
22.8	4	108.0	93	3.85	2.320	18.61	1	1	4	1
21.4	6	258.0	110	3.08	3.215	19.44	1	0	3	1
18.7	8	360.0	175	3.15	3.440	17.02	0	0	3	2
18.1	6	225.0	105	2.76	3.460	20.22	1	0	3	1
14.3	8	360.0	245	3.21	3.570	15.84	0	0	3	4
24.4	4	146.7	62	3.69	3.190	20.00	1	0	4	2
22.8	4	140.8	95	3.92	3.150	22.90	1	0	4	2
19.2	6	167.6	123	3.92	3.440	18.30	1	0	4	4

4.1.2 Where

WHERE clause is used to extract only those records that fulfill a condition.

SELECT * FROM mtcars WHERE am = 0 LIMIT 5;

5 records
mpg	cyl	disp	hp	drat	wt	qsec	vs	gear	carb
21.4	6	258.0	110	3.08	3.215	19.44	1	3	1
18.7	8	360.0	175	3.15	3.440	17.02	0	3	2
18.1	6	225.0	105	2.76	3.460	20.22	1	3	1
14.3	8	360.0	245	3.21	3.570	15.84	0	3	4
24.4	4	146.7	62	3.69	3.190	20.00	1	4	2

4.1.3 Count

COUNT takes the name of a column as argument and counts the number of rows when the column is not NULL.

SELECT COUNT(*) FROM mtcars;

1 records
COUNT(*)
32

4.1.4 DISTINCT

Get unique values in specified columns.

SELECT DISTINCT Species FROM iris

3 records
Species
setosa
versicolor
virginica

4.1.5 Insert Values

Insert new rows in the table.

library(tidyverse)

df <- tribble(
  ~x, ~y,  ~z,
  "a", 2,  3.6,
  "b", 1,  8.5
)
df

# A tibble: 2 × 3
  x         y     z
  <chr> <dbl> <dbl>
1 a         2   3.6
2 b         1   8.5

Now, we can insert new values into df.

INSERT INTO df (x, y, z) VALUES('c', 3, 1);
SELECT * FROM df;

4.1.6 Updates

Update the rows in the table.

UPDATE df SET z = 77 WHERE x ='b';

4.1.7 Delete

Remove rows from the table which are specified in the WHERE condition.

DELETE from df WHERE x = 'b';

4.2 Data management

4.2.1 Create table

Each column in the table is specified with its name, data type and an optional keyword which could be PRIMARY KEY, NOT NULL, etc.,

CREATE TABLE PETSALE (
    ID INTEGER NOT NULL,
    PET CHAR(20),
    SALEPRICE DECIMAL(6,2),
    PROFIT DECIMAL(6,2),
    SALEDATE DATE
    );

INSERT INTO fills the table:

INSERT INTO PETSALE VALUES
    (1,'Cat',450.09,100.47,'2018-05-29'),
    (2,'Dog',666.66,150.76,'2018-06-01'),
    (3,'Parrot',50.00,8.9,'2018-06-04'),
    (4,'Hamster',60.60,12,'2018-06-11'),
    (5,'Goldfish',48.48,3.5,'2018-06-14');

4.2.2 Alter table

ADD COLUMN:

ALTER TABLE PETSALE
ADD COLUMN QUANTITY INTEGER;

Fill in values:

UPDATE PETSALE SET QUANTITY = 9 WHERE ID = 1;

Check whether it worked

SELECT * FROM PETSALE;

5 records
ID	PET	SALEPRICE	PROFIT	SALEDATE	QUANTITY
1	Cat	450.09	100.47	2018-05-29	9
2	Dog	666.66	150.76	2018-06-01	NA
3	Parrot	50.00	8.90	2018-06-04	NA
4	Hamster	60.60	12.00	2018-06-11	NA
5	Goldfish	48.48	3.50	2018-06-14	NA

DROP COLUMN:

ALTER TABLE PETSALE
DROP COLUMN PROFIT;

SELECT * FROM PETSALE;

5 records
ID	PET	SALEPRICE	SALEDATE	QUANTITY
1	Cat	450.09	2018-05-29	9
2	Dog	666.66	2018-06-01	NA
3	Parrot	50.00	2018-06-04	NA
4	Hamster	60.60	2018-06-11	NA
5	Goldfish	48.48	2018-06-14	NA

ALTER COLUMN:

ALTER TABLE PETSALE
ALTER COLUMN PET SET DATA TYPE VARCHAR(20);

SELECT * FROM PETSALE;

5 records
ID	PET	SALEPRICE	SALEDATE	QUANTITY
1	Cat	450.09	2018-05-29	9
2	Dog	666.66	2018-06-01	NA
3	Parrot	50.00	2018-06-04	NA
4	Hamster	60.60	2018-06-11	NA
5	Goldfish	48.48	2018-06-14	NA

RENAME COLUMN:

ALTER TABLE PETSALE
RENAME COLUMN PET TO ANIMAL;

SELECT * FROM PETSALE;

5 records
ID	ANIMAL	SALEPRICE	SALEDATE	QUANTITY
1	Cat	450.09	2018-05-29	9
2	Dog	666.66	2018-06-01	NA
3	Parrot	50.00	2018-06-04	NA
4	Hamster	60.60	2018-06-11	NA
5	Goldfish	48.48	2018-06-14	NA

4.2.3 Truncate

The TRUNCATE statement will remove all(!) rows from an existing table, just like the one we created in the beginning, however, it does not delete the table itself.

TRUNCATE TABLE PET IMMEDIATE;

Caution: DROP TABLE tablename; drops the entire table!

DROP TABLE PETSALE;

4.3 Calculations

For this subsection I created a simple table that contains attributes about countries:

SELECT * FROM df ;

4 records
country	y	z	id
Germany	2	3.6	1
Austria	1	8.5	2
Brazil	4	2.5	3
Brazil	3	3.5	3

4.3.1 Like

The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

There are two wildcards often used in conjunction with the LIKE operator which are percent sign(%) and underscore sign (_).

SELECT * FROM df WHERE country LIKE 'A%';

1 records
country	y	z	id
Austria	1	8.5	2

4.3.2 Between

Select data within a range:

select * from mtcars
where (hp BETWEEN 100 and 200) and AM = 1 ;

5 records
mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
21.0	6	160.0	110	3.90	2.620	16.46	0	1	4	4
21.0	6	160.0	110	3.90	2.875	17.02	0	1	4	4
30.4	4	95.1	113	3.77	1.513	16.90	1	1	5	2
19.7	6	145.0	175	3.62	2.770	15.50	0	1	5	6
21.4	4	121.0	109	4.11	2.780	18.60	1	1	4	2

4.3.3 In

Use the IN operator to select observations that match the provided list of the IN operator:

SELECT * FROM df WHERE country IN ('Brazil');

2 records
country	y	z	id
Brazil	4	2.5	3
Brazil	3	3.5	3

4.3.4 Order by

ORDER BY is used to sort the result-set in ascending or descending order. The default is ascending.

SELECT * FROM df ORDER BY country;

4 records
country	y	z	id
Austria	1	8.5	2
Brazil	4	2.5	3
Brazil	3	3.5	3
Germany	2	3.6	1

4.3.5 Distinct

Exclude duplicates with DISTINCT:

SELECT distinct(country) FROM df ;

3 records
country
Germany
Austria
Brazil

4.3.6 Group by

GROUP BY is used in collaboration with SELECT to arrange identical data into groups.

SELECT country, count (country) AS Var_Name from df group by country;

3 records
country	Var_Name
Austria	1
Brazil	2
Germany	1

4.3.7 Average

SELECT country, AVG(z) as Mean from df group by country;

3 records
country	Mean
Austria	8.5
Brazil	3.0
Germany	3.6

4.3.8 Having

The WHEREclause is for entire result set; while HAVING works only for the GROUPED BY clause.

SELECT country, count (country) AS Count from df group by country having count (country) > 1;

1 records
country	Count
Brazil	2

4.3.9 Further functions

Summarize groups:

select sum(mpg) as sum_mpg from mtcars where hp > 100

1 records
sum_mpg
401.4

Round to the nearest integer:

select round(drat, 1) as round_drat from mtcars

Displaying records 1 - 10
round_drat
3.9
3.9
3.9
3.1
3.1
2.8
3.2
3.7
3.9
3.9

The length of a string:

select length(country) from df

4 records
length(country)
7
7
6
6

Depending the SQL database you use, in db2 you can use the upper (UCASE) and lower case (LCASE) function for strings.

select upper(country) from df

4 records
upper(country)
GERMANY
AUSTRIA
BRAZIL
BRAZIL

In case of Oracle the functions are called lower and upper.

4.3.10 Date and Time Built-in Functions

Talking about SQL databases, there are three different possibilities to work with date and time DB2.
- Date: YYYYMMDD (Year/Month/Day) - Time: HHMMSS (Hours/Min/Sec) - Timestamp: YYYYMMDDHHMMSSZZZZZZ (Date/Time/Microseconds)

Depending on what you are up to do, there are functions to extract the day, month, day of month, day of week, day of year, week, hour, minute, and second. You can also extract the current_date and the current_time. Unfortunately, this does not work in Oracle the same way as in DB2, but to give you an example how to extract the day:

select day(date) from df where country = 'Germany'

4.3.11 Working with Multiple Tables

There are several ways to access multiple tables in the same query. Namely, using sub-queries, implicit join, and join operators, such as INNER JOIN and OUTER JOIN. For instance:

select * from df2;

2 records
country	valid	id
Germany	1	1
Austria	0	2

Let’s say we want only observations from df that are listed in df2. In such a situation we can use a sub-queries:

select * from df
  where country in
  (select country from df2)

2 records
country	y	z	id
Germany	2	3.6	1
Austria	1	8.5	2

Of course, you could add also information of the second table and include only countries with a certain value:

select * from df
  where country in
  (select country from df2 where valid = 1)

1 records
country	y	z	id
Germany	2	3.6	1

Implicit joins implies that we can access multiple tables by specifying them in the FROM clause of the query. This leads to a CROSS JOIN (also known as Cartesian Join).

select * from df, df2

8 records
country	y	z	id	country	valid	id
Germany	2	3.6	1	Germany	1	1
Germany	2	3.6	1	Austria	0	2
Austria	1	8.5	2	Germany	1	1
Austria	1	8.5	2	Austria	0	2
Brazil	4	2.5	3	Germany	1	1
Brazil	4	2.5	3	Austria	0	2
Brazil	3	3.5	3	Germany	1	1
Brazil	3	3.5	3	Austria	0	2

In DBL2 we can use the where clause to match data (see code); in Oracle there are other matching operaters

select * from df, df2 where df.id = df.id;

In case of long names, we can use shorter aliases for table names (or use column names with aliases in the SELECT clause):

select * from df A, df2 B where A.id = B.id;