Home > Mobile >  How to do division of groups in dataframe in R?
How to do division of groups in dataframe in R?

Time:04-04

I have the following dataframe:

df <- tibble(year = c("2020","2020","2020","2021","2021","2021"), 
             website = c("google","facebook","twitter","google","facebook","twitter"), 
             category = c("big","big","small","big","big","small"), 
             value = c(10,20,30,40,50,60))

How can i calculate the change in the different between the years?

So for example i want to compare 2021 vs 2020, how can i do this in R?

for example the output should be something like this

year website category comparison
2021 vs 2020 google big 4
2021 vs 2020 facebook big 2.5
2021 vs 2020 twitter small 2

Where the column comparison is effectively value from current year/ value from previous year

I'm not too sure how to do this in dplyr?

CodePudding user response:

I would pivot the year column to two different columns, and then use mutate() to calcualte comparison.

Here is an example:

library(dplyr)

df %>% 
  tidyr::pivot_wider(names_from = year) %>% 
  mutate(
    comparison = `2021`/`2020`
  )
#> # A tibble: 3 × 5
#>   website  category `2020` `2021` comparison
#>   <chr>    <chr>     <dbl>  <dbl>      <dbl>
#> 1 google   big          10     40        4  
#> 2 facebook big          20     50        2.5
#> 3 twitter  small        30     60        2

Created on 2022-04-04 by the reprex package (v2.0.1)

You can remove the extra columns with select() afterwards.

Note: if your column is not called value, then you need to add this argument to pivot_wider(): values_from = <your column name with values>.

Update: If you want to keep your data in a long format

Just use group_by(website), arrange(year) and lag() inside mutate():

library(dplyr)

df %>% 
  group_by(website) %>% 
  arrange(year) %>% 
  mutate(
    comparison = value / lag(value)
  )
#> # A tibble: 6 × 5
#> # Groups:   website [3]
#>   year  website  category value comparison
#>   <chr> <chr>    <chr>    <dbl>      <dbl>
#> 1 2020  google   big         10       NA  
#> 2 2020  facebook big         20       NA  
#> 3 2020  twitter  small       30       NA  
#> 4 2021  google   big         40        4  
#> 5 2021  facebook big         50        2.5
#> 6 2021  twitter  small       60        2

Created on 2022-04-04 by the reprex package (v2.0.1)

CodePudding user response:

You could do

library(tidyverse)

df %>% 
  group_by(website) %>%
  arrange(year) %>%
  summarize(year = paste(year[2], year[1], sep = ' vs '),
            category = category[1],
            comparison = value[2] / value[1]) 
#>  A tibble: 3 x 4
#>   website  year         category comparison
#>   <chr>    <chr>        <chr>         <dbl>
#> 1 facebook 2021 vs 2020 big             2.5
#> 2 google   2021 vs 2020 big             4  
#> 3 twitter  2021 vs 2020 small           2  
  • Related