Home > OS >  Compare Life Expectancy from an initial year 1952, and compare that expectancy to all further years,
Compare Life Expectancy from an initial year 1952, and compare that expectancy to all further years,

Time:09-23

In my R class, we are currently learning how to manipulate Tibbles. I have a homework problem where I need to grab the life expectancy from 1952 for a country and compare it to all its other expectancies for however many years of data the tibble has. For all countries within the table, in one line using pipes.

Background: this table is called gap

I have used the line:

gap %>% group_by(year, lifeExp) %>% filter(year == 1952) 

To filter out the lifeExp for all countries during 1952, but from there I have no idea how to pipe back into the table and compare those initial values to the other specific country values. I know what all the basic dplyr functions do, just having trouble seeing the bigger picture with all the pipes.

If this wasn't enough to understand, I will edit! Thank you for any kind of support!

CodePudding user response:

I don't think you want to use filter(), as that removes rows from your dataframe and you need to keep those rows to make the comparisons. Perhaps using mutate() to create a new variable with the difference for each year compared to the 'first' year (1952) for each country would solve your problem? E.g.

library(tidyverse)
library(gapminder)

gapminder %>% group_by(country) %>% mutate(lifeExp_increase_vs_1952 = lifeExp - first(lifeExp)) %>% select(country, year, lifeExp, lifeExp_increase_vs_1952)

#> # A tibble: 1,704 × 4
#> # Groups:   country [142]
#>    country      year lifeExp lifeExp_increase_vs_1952
#>    <fct>       <int>   <dbl>                    <dbl>
#>  1 Afghanistan  1952    28.8                     0   
#>  2 Afghanistan  1957    30.3                     1.53
#>  3 Afghanistan  1962    32.0                     3.20
#>  4 Afghanistan  1967    34.0                     5.22
#>  5 Afghanistan  1972    36.1                     7.29
#>  6 Afghanistan  1977    38.4                     9.64
#>  7 Afghanistan  1982    39.9                    11.1 
#>  8 Afghanistan  1987    40.8                    12.0 
#>  9 Afghanistan  1992    41.7                    12.9 
#> 10 Afghanistan  1997    41.8                    13.0 
#> 11 Afghanistan  2002    42.1                    13.3 
#> 12 Afghanistan  2007    43.8                    15.0 
#> 13 Albania      1952    55.2                     0   
#> 14 Albania      1957    59.3                     4.05
#> 15 Albania      1962    64.8                     9.59
#> 16 Albania      1967    66.2                    11.0 
#> 17 Albania      1972    67.7                    12.5 
#> 18 Albania      1977    68.9                    13.7 
#> 19 Albania      1982    70.4                    15.2 
#> 20 Albania      1987    72                      16.8 
#> # … with 1,684 more rows

Created on 2021-09-23 by the reprex package (v2.0.1)

CodePudding user response:

You can solve it with the help of mutate and match.

library(dplyr)

gapminder::gapminder %>% 
  group_by(country) %>% 
  mutate(difference = lifeExp - lifeExp[match(1952, year)]) %>%
  ungroup -> gap

gap

#   country     continent  year lifeExp      pop gdpPercap difference
#   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>      <dbl>
# 1 Afghanistan Asia       1952    28.8  8425333      779.       0   
# 2 Afghanistan Asia       1957    30.3  9240934      821.       1.53
# 3 Afghanistan Asia       1962    32.0 10267083      853.       3.20
# 4 Afghanistan Asia       1967    34.0 11537966      836.       5.22
# 5 Afghanistan Asia       1972    36.1 13079460      740.       7.29
# 6 Afghanistan Asia       1977    38.4 14880372      786.       9.64
# 7 Afghanistan Asia       1982    39.9 12881816      978.      11.1 
# 8 Afghanistan Asia       1987    40.8 13867957      852.      12.0 
# 9 Afghanistan Asia       1992    41.7 16317921      649.      12.9 
#10 Afghanistan Asia       1997    41.8 22227415      635.      13.0 
# … with 1,694 more rows
  • Related