Home > front end >  Divide Value from one Dataframe by Value of another Dataframe
Divide Value from one Dataframe by Value of another Dataframe

Time:05-18

these are my two dataframes:

    structure(list(Full.Name = c("A. Patrick Beharelle", "A. Patrick Beharelle", 
"Aaron P. Graft", "Aaron P. Graft", "Aaron P. Jagdfeld"), year = c(2019, 
2020, 2019, 2020, 2019), counter = c(5541L, 3269L, 165L, 200L, 
4L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L), groups = structure(list(Full.Name = c("A. Patrick Beharelle", 
"Aaron P. Graft", "Aaron P. Jagdfeld"), .rows = structure(list(
    1:2, 3:4, 5L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))

and

structure(list(authority_dic = c("accomplished", "accomplished", 
"accomplished", "accomplished", "accomplished"), Full.Name = c("A. Patrick Beharelle", "A. Patrick Beharelle", 
"Aaron P. Graft", "Aaron P. Graft", "Aaron P. Jagdfeld"), Entity = c("WERNER ENTERPRISES INC", "MONDELEZ INTERNATIONAL INC", 
"AEROJET ROCKETDYNE HOLDINGS", "T-MOBILE US INC", "SOUTHWEST AIRLINES"
), `2019` = c(1L, 0L, 1L, 0L, 0L), `2020` = c(0L, 1L, 0L, 3L, 
1L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-5L), groups = structure(list(authority_dic = c("accomplished", 
"accomplished", "accomplished", "accomplished", "accomplished"
), Full.Name = c("Derek J. Leathers", "Dirk Van de Put", "Eileen P. Drake", 
"G. Michael Sievert", "Gary C. Kelly"), .rows = structure(list(
    1L, 2L, 3L, 4L, 5L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), .drop = TRUE))

Now, I want to divide each value of the "2019" column by the "counter" value of the other dataframe and add the result as another column. The complexity comes in place, because I only want to divide through the "counter" value of 2019 and of (for example) Aaron P. Graft. I want to do this for every row of the dataframe containing the name "Aaron P. Graft" and therefore take the value of "counter" from the other dataframe in the row, which contains "Aaron P. Graft".

I cannot figure it out on my own. Maybe I need to transpose the year and counter columns in the first dataframe, but I don't know.

Thats what I want to achieve:

authority_dic Full.name 2019 2020 2019_freq 2020_freq
example word Aaron P. Jagdfeld 10 20 10/counter(of 2019) 20/counter(of 2020)

If there are any questions, don't mind asking me. Thanks in advance!!!

CodePudding user response:

Let the structures be s1 and s2, this should work:

library(tidyr)
mutate(
      full_join(
         summarise(
            group_by(s2, authority_dic, Full.Name),
            `2019`=sum(`2019`),
            `2020`=sum(`2020`)),
         s1 %>% spread(year,counter),
         by=c("Full.Name")),
      `2019_freq`=`2019.x`/`2019.y`,
      `2020_freq`=`2020.x`/`2020.y`)
# A tibble: 3 × 8
# Groups:   authority_dic [1]
  authority_dic Full.Name            `2019.x` `2020.x` `2019.y` `2020.y` `2019_freq` `2020_freq`
  <chr>         <chr>                   <int>    <int>    <int>    <int>       <dbl>       <dbl>
1 accomplished  A. Patrick Beharelle        1        1     5541     3269    0.000180    0.000306
2 accomplished  Aaron P. Graft              1        3      165      200    0.00606     0.015   
3 accomplished  Aaron P. Jagdfeld           0        1        4       NA    0          NA       

Good practice is to avoid naming columns with values, e.g. 2019.... Use year instead. Your model needs refactoring into normal form (see database normalization topic for more information).

  • Related