In my R class, we are currently learning how to manipulate Tibbles. I have a homework problem where I need to grab the life expectancy from 1952 for a country and compare it to all its other expectancies for however many years of data the tibble has. For all countries within the table, in one line using pipes.
Background: this table is called gap
I have used the line:
gap %>% group_by(year, lifeExp) %>% filter(year == 1952)
To filter out the lifeExp for all countries during 1952, but from there I have no idea how to pipe back into the table and compare those initial values to the other specific country values. I know what all the basic dplyr functions do, just having trouble seeing the bigger picture with all the pipes.
If this wasn't enough to understand, I will edit! Thank you for any kind of support!
CodePudding user response:
I don't think you want to use filter()
, as that removes rows from your dataframe and you need to keep those rows to make the comparisons. Perhaps using mutate()
to create a new variable with the difference for each year compared to the 'first' year (1952) for each country would solve your problem? E.g.
library(tidyverse)
library(gapminder)
gapminder %>% group_by(country) %>% mutate(lifeExp_increase_vs_1952 = lifeExp - first(lifeExp)) %>% select(country, year, lifeExp, lifeExp_increase_vs_1952)
#> # A tibble: 1,704 × 4
#> # Groups: country [142]
#> country year lifeExp lifeExp_increase_vs_1952
#> <fct> <int> <dbl> <dbl>
#> 1 Afghanistan 1952 28.8 0
#> 2 Afghanistan 1957 30.3 1.53
#> 3 Afghanistan 1962 32.0 3.20
#> 4 Afghanistan 1967 34.0 5.22
#> 5 Afghanistan 1972 36.1 7.29
#> 6 Afghanistan 1977 38.4 9.64
#> 7 Afghanistan 1982 39.9 11.1
#> 8 Afghanistan 1987 40.8 12.0
#> 9 Afghanistan 1992 41.7 12.9
#> 10 Afghanistan 1997 41.8 13.0
#> 11 Afghanistan 2002 42.1 13.3
#> 12 Afghanistan 2007 43.8 15.0
#> 13 Albania 1952 55.2 0
#> 14 Albania 1957 59.3 4.05
#> 15 Albania 1962 64.8 9.59
#> 16 Albania 1967 66.2 11.0
#> 17 Albania 1972 67.7 12.5
#> 18 Albania 1977 68.9 13.7
#> 19 Albania 1982 70.4 15.2
#> 20 Albania 1987 72 16.8
#> # … with 1,684 more rows
Created on 2021-09-23 by the reprex package (v2.0.1)
CodePudding user response:
You can solve it with the help of mutate
and match
.
library(dplyr)
gapminder::gapminder %>%
group_by(country) %>%
mutate(difference = lifeExp - lifeExp[match(1952, year)]) %>%
ungroup -> gap
gap
# country continent year lifeExp pop gdpPercap difference
# <fct> <fct> <int> <dbl> <int> <dbl> <dbl>
# 1 Afghanistan Asia 1952 28.8 8425333 779. 0
# 2 Afghanistan Asia 1957 30.3 9240934 821. 1.53
# 3 Afghanistan Asia 1962 32.0 10267083 853. 3.20
# 4 Afghanistan Asia 1967 34.0 11537966 836. 5.22
# 5 Afghanistan Asia 1972 36.1 13079460 740. 7.29
# 6 Afghanistan Asia 1977 38.4 14880372 786. 9.64
# 7 Afghanistan Asia 1982 39.9 12881816 978. 11.1
# 8 Afghanistan Asia 1987 40.8 13867957 852. 12.0
# 9 Afghanistan Asia 1992 41.7 16317921 649. 12.9
#10 Afghanistan Asia 1997 41.8 22227415 635. 13.0
# … with 1,694 more rows