Skip to the content.

ggplot2

Dieter 2022-08-26

Installing the gapminder package

We will use the gapminder package to get public health data. This data comes from the excellent gapminder website. This package provides a gapminder tibble that contains the following variables:

Loading the package.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gapminder)
gap_data <- gapminder #So I can see the data in the environment tab

A first simple example

subset <- filter(gap_data, country == 'Algeria')
ggplot(subset) + aes(x=year, y=pop) + geom_point()

subset <- filter(gap_data, country == 'Algeria')
ggplot(subset) + aes(x=year, y=pop) + geom_col() #Do not use geom_bar as that counts the data

We could use geom_bar() but we have to override the default counting behavior.

subset <- filter(gap_data, country == 'Algeria')
ggplot(subset) + aes(x=year, y=pop) + geom_bar(stat= 'identity') 

grp <- group_by(gap_data, country)
latest <- slice(grp, which.max(year))
grp <- group_by(latest, continent)
summary_data <- summarize(grp, sm = sum(pop) / 1000000000, mn = mean(pop))
p <- ggplot(summary_data) + aes(x = continent, y = sm) + geom_col()
p

ggsave('saved.pdf', p)
## Saving 7 x 5 in image

Exploring some data

Distributions

Read in some body data.

data <- read_csv('data/body.csv')
## Rows: 507 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (25): Biacromial, Biiliac, Bitrochanteric, ChestDepth, ChestDia, ElbowDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Create a simple histogram

ggplot(data) + aes(x = Hip, y=..density..) + geom_histogram() + geom_density(col='red')
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Boxplots are also very nice tools to get a quick view of your data. Here, we overlay the raw data in red.

ggplot(data) + aes(x = factor(Gender), y = Forearm) + geom_boxplot() + geom_jitter(width=0.1, height = 0, alpha = 0.25, col='red')

### Scatter plots

ggplot(data) + aes(x = Hip, y = Forearm) + geom_point() + geom_smooth(method = "loess") +  geom_smooth(method = "lm", col='red')
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

More aesthetics

Map variables on shapes and colors.

#ggplot(data) + aes(x= Thigh, y=Waist, shape = Gender) + geom_point() # Does not work
#ggplot(data) + aes(x= Thigh, y=Waist, shape = as.factor(Gender), color= Gender) + geom_point() # Does work - but notice the output
ggplot(data) + aes(x= Thigh, y=Waist, shape = as.factor(Gender), color= as.factor(Gender)) + geom_point() # Does work - but notice the output

Another example using the gapminder data

grp <- group_by(gap_data,year,continent)
mns <- summarise(grp, mn = sum(pop) / 1000000000)
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
ggplot(mns) + aes(x= year, y = mn, color = continent) + geom_point()

subset <- filter(gap_data, country == 'Angola')
ggplot(subset) + aes(x= year, y = pop, size = gdpPercap) + geom_point()

Small multiples: multiple panels

grp <- group_by(gap_data, year, continent)
mns <- summarise(grp, mn = mean(lifeExp))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
ggplot(mns) + aes(x = year, y=mn) + geom_point() + facet_grid(.~continent) 

ggplot(mns) + aes(x = year, y=mn) + geom_point() + facet_wrap(~continent) + geom_smooth(method ='lm', color='gray', se=FALSE)
## `geom_smooth()` using formula = 'y ~ x'