ggplot2
Dieter 2022-08-26
- Installing the gapminder package
- A first simple example
- Exploring some data
- More aesthetics
- Small multiples: multiple panels
Installing the gapminder package
We will use the gapminder
package to get public health data. This data
comes from the excellent gapminder
website. This package provides a
gapminder
tibble that contains the following variables:
- country
- continent
- year
- lifeExp life
- pop
- gdpPercap
Loading the package.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gapminder)
gap_data <- gapminder #So I can see the data in the environment tab
A first simple example
subset <- filter(gap_data, country == 'Algeria')
ggplot(subset) + aes(x=year, y=pop) + geom_point()
subset <- filter(gap_data, country == 'Algeria')
ggplot(subset) + aes(x=year, y=pop) + geom_col() #Do not use geom_bar as that counts the data
We could
use geom_bar()
but we have to override the default counting behavior.
subset <- filter(gap_data, country == 'Algeria')
ggplot(subset) + aes(x=year, y=pop) + geom_bar(stat= 'identity')
grp <- group_by(gap_data, country)
latest <- slice(grp, which.max(year))
grp <- group_by(latest, continent)
summary_data <- summarize(grp, sm = sum(pop) / 1000000000, mn = mean(pop))
p <- ggplot(summary_data) + aes(x = continent, y = sm) + geom_col()
p
ggsave('saved.pdf', p)
## Saving 7 x 5 in image
Exploring some data
Distributions
Read in some body data.
data <- read_csv('data/body.csv')
## Rows: 507 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (25): Biacromial, Biiliac, Bitrochanteric, ChestDepth, ChestDia, ElbowDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Create a simple histogram
ggplot(data) + aes(x = Hip, y=..density..) + geom_histogram() + geom_density(col='red')
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Boxplots are also very nice tools to get a quick view of your data. Here, we overlay the raw data in red.
ggplot(data) + aes(x = factor(Gender), y = Forearm) + geom_boxplot() + geom_jitter(width=0.1, height = 0, alpha = 0.25, col='red')
### Scatter plots
ggplot(data) + aes(x = Hip, y = Forearm) + geom_point() + geom_smooth(method = "loess") + geom_smooth(method = "lm", col='red')
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
More aesthetics
Map variables on shapes and colors.
#ggplot(data) + aes(x= Thigh, y=Waist, shape = Gender) + geom_point() # Does not work
#ggplot(data) + aes(x= Thigh, y=Waist, shape = as.factor(Gender), color= Gender) + geom_point() # Does work - but notice the output
ggplot(data) + aes(x= Thigh, y=Waist, shape = as.factor(Gender), color= as.factor(Gender)) + geom_point() # Does work - but notice the output
Another example using the gapminder data
grp <- group_by(gap_data,year,continent)
mns <- summarise(grp, mn = sum(pop) / 1000000000)
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
ggplot(mns) + aes(x= year, y = mn, color = continent) + geom_point()
subset <- filter(gap_data, country == 'Angola')
ggplot(subset) + aes(x= year, y = pop, size = gdpPercap) + geom_point()
Small multiples: multiple panels
grp <- group_by(gap_data, year, continent)
mns <- summarise(grp, mn = mean(lifeExp))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
ggplot(mns) + aes(x = year, y=mn) + geom_point() + facet_grid(.~continent)
ggplot(mns) + aes(x = year, y=mn) + geom_point() + facet_wrap(~continent) + geom_smooth(method ='lm', color='gray', se=FALSE)
## `geom_smooth()` using formula = 'y ~ x'