I have the following dataframe:
df <- tibble(year = c("2020","2020","2020","2021","2021","2021"),
website = c("google","facebook","twitter","google","facebook","twitter"),
category = c("big","big","small","big","big","small"),
value = c(10,20,30,40,50,60))
How can i calculate the change in the different between the years?
So for example i want to compare 2021 vs 2020, how can i do this in R?
for example the output should be something like this
year | website | category | comparison |
---|---|---|---|
2021 vs 2020 | big | 4 | |
2021 vs 2020 | big | 2.5 | |
2021 vs 2020 | small | 2 |
Where the column comparison is effectively value from current year/ value from previous year
I'm not too sure how to do this in dplyr?
CodePudding user response:
I would pivot the year column to two different columns, and then use mutate()
to calcualte comparison
.
Here is an example:
library(dplyr)
df %>%
tidyr::pivot_wider(names_from = year) %>%
mutate(
comparison = `2021`/`2020`
)
#> # A tibble: 3 × 5
#> website category `2020` `2021` comparison
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 google big 10 40 4
#> 2 facebook big 20 50 2.5
#> 3 twitter small 30 60 2
Created on 2022-04-04 by the reprex package (v2.0.1)
You can remove the extra columns with select()
afterwards.
Note: if your column is not called
value
, then you need to add this argument topivot_wider()
:values_from = <your column name with values>
.
Update: If you want to keep your data in a long format
Just use group_by(website)
, arrange(year)
and lag()
inside mutate()
:
library(dplyr)
df %>%
group_by(website) %>%
arrange(year) %>%
mutate(
comparison = value / lag(value)
)
#> # A tibble: 6 × 5
#> # Groups: website [3]
#> year website category value comparison
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2020 google big 10 NA
#> 2 2020 facebook big 20 NA
#> 3 2020 twitter small 30 NA
#> 4 2021 google big 40 4
#> 5 2021 facebook big 50 2.5
#> 6 2021 twitter small 60 2
Created on 2022-04-04 by the reprex package (v2.0.1)
CodePudding user response:
You could do
library(tidyverse)
df %>%
group_by(website) %>%
arrange(year) %>%
summarize(year = paste(year[2], year[1], sep = ' vs '),
category = category[1],
comparison = value[2] / value[1])
#> A tibble: 3 x 4
#> website year category comparison
#> <chr> <chr> <chr> <dbl>
#> 1 facebook 2021 vs 2020 big 2.5
#> 2 google 2021 vs 2020 big 4
#> 3 twitter 2021 vs 2020 small 2