Skip to contents

Every now and then I searched in my own code for a specific graph that I wanted to reuse. The package includes some of the graphs that I created which might be a good starting point for my future self or for your own project. The plotgraph() function runs the installed source code and returns the graphs. For example, do you know the datasaurus plot?

# plotgraph function runs the source code
library(edgar)
plotgraph("datasaurus.R")

Data saurus by Edgar Treischl

Without input, the plotgraph() function returns available graphs.

# list available graphs without input
plotgraph()
#> Error in plotgraph(): could not find function "plotgraph"

Anscombe quartet

Anscombe quartet is a set of four datasets that have nearly identical simple descriptive statistics, yet appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties.

plotgraph("anscombe_quartet.R")

Anscombe quartet by Edgar Treischl

Boxplot Illustration

The Boxplot Illustration shows the main components of a boxplot. The boxplot is a standardized way of displaying the distribution of data based on a five-number summary: minimum, first quartile, median, third quartile, and maximum. It can also show outliers.

plotgraph("boxplot_illustration.R")

Boxplot Illustration by Edgar Treischl

Boxplot pitfalls

The Boxplot pitfalls shows the main problem with boxplots. A boxplot shows the distribution of a dataset, but not the underlying data. A jitter plot is added to show the underlying data.

plotgraph("boxplot_pitfalls.R")

Boxplot Pitfalls by Edgar Treischl

Data format

The data format shows the difference between long and wide data format. The long format is often preferred for data analysis and visualization. The wide format is often preferred for data storage and data entry.

plotgraph("long_wide.R")

Data joins by Edgar Treischl

Data joins

The plot shows the different types of joins inspired by the Data Wrangling with dplyr and tidyr chapter from the R for Data Science book. The plot shows the different types of joins: inner, left, right, and full join.

plotgraph("data_joins.R")

Data joins by Edgar Treischl

Data saurus

The datasaurus plot shows the importance of visualizing data. The datasaurus plot shows the same summary statistics for 12 datasets. The plot shows the importance of visualizing data before analyzing it. The datasaurus plot is inspired by the Datasaurus Dozen paper.

plotgraph("datasaurus.R")

Data saurus by Edgar Treischl

Gapminder

The Gapminder bubble chart shows the life expectancy and GDP per capita for countries over time. The Gapminder bubble chart is inspired by the Gapminder project.

plotgraph("gapminder.R")

Gapminder by Edgar Treischl

Pac-Man

The Pac-Man plot shows how many pie charts resemble Pac-Man.

Graphs::plotgraph("pacman.R")

The Pacman plot by Edgar Treischl

Simpson’s paradox

The Simpson’s paradox plot shows how the correlation between two variables can change when a third variable is added. It underlines the importance of visualizing data and causal inference, since overall it may seem that there is positive correlation, but when the data is split into groups, the correlation can be negative.

Graphs::plotgraph("simpson.R")

Simpsons Paradox by Edgar Treischl

UCB Admission

Where students discriminated? The UCB Admission plot shows the admission rates for different departments at the University of California, Berkeley. The UCB Admission case illustrates the importance of causal inference since it seems that more women were rejected, but when the data is split into departments, the opposite can be true.

edgar::plotgraph("ucb_admission.R")

UCB Plot by Edgar Treischl