these are my two dataframes:
structure(list(Full.Name = c("A. Patrick Beharelle", "A. Patrick Beharelle",
"Aaron P. Graft", "Aaron P. Graft", "Aaron P. Jagdfeld"), year = c(2019,
2020, 2019, 2020, 2019), counter = c(5541L, 3269L, 165L, 200L,
4L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L), groups = structure(list(Full.Name = c("A. Patrick Beharelle",
"Aaron P. Graft", "Aaron P. Jagdfeld"), .rows = structure(list(
1:2, 3:4, 5L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), .drop = TRUE))
and
structure(list(authority_dic = c("accomplished", "accomplished",
"accomplished", "accomplished", "accomplished"), Full.Name = c("A. Patrick Beharelle", "A. Patrick Beharelle",
"Aaron P. Graft", "Aaron P. Graft", "Aaron P. Jagdfeld"), Entity = c("WERNER ENTERPRISES INC", "MONDELEZ INTERNATIONAL INC",
"AEROJET ROCKETDYNE HOLDINGS", "T-MOBILE US INC", "SOUTHWEST AIRLINES"
), `2019` = c(1L, 0L, 1L, 0L, 0L), `2020` = c(0L, 1L, 0L, 3L,
1L)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L), groups = structure(list(authority_dic = c("accomplished",
"accomplished", "accomplished", "accomplished", "accomplished"
), Full.Name = c("Derek J. Leathers", "Dirk Van de Put", "Eileen P. Drake",
"G. Michael Sievert", "Gary C. Kelly"), .rows = structure(list(
1L, 2L, 3L, 4L, 5L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), .drop = TRUE))
Now, I want to divide each value of the "2019" column by the "counter" value of the other dataframe and add the result as another column. The complexity comes in place, because I only want to divide through the "counter" value of 2019 and of (for example) Aaron P. Graft. I want to do this for every row of the dataframe containing the name "Aaron P. Graft" and therefore take the value of "counter" from the other dataframe in the row, which contains "Aaron P. Graft".
I cannot figure it out on my own. Maybe I need to transpose the year and counter columns in the first dataframe, but I don't know.
Thats what I want to achieve:
authority_dic | Full.name | 2019 | 2020 | 2019_freq | 2020_freq |
---|---|---|---|---|---|
example word | Aaron P. Jagdfeld | 10 | 20 | 10/counter(of 2019) | 20/counter(of 2020) |
If there are any questions, don't mind asking me. Thanks in advance!!!
CodePudding user response:
Let the structures be s1
and s2
, this should work:
library(tidyr)
mutate(
full_join(
summarise(
group_by(s2, authority_dic, Full.Name),
`2019`=sum(`2019`),
`2020`=sum(`2020`)),
s1 %>% spread(year,counter),
by=c("Full.Name")),
`2019_freq`=`2019.x`/`2019.y`,
`2020_freq`=`2020.x`/`2020.y`)
# A tibble: 3 × 8
# Groups: authority_dic [1]
authority_dic Full.Name `2019.x` `2020.x` `2019.y` `2020.y` `2019_freq` `2020_freq`
<chr> <chr> <int> <int> <int> <int> <dbl> <dbl>
1 accomplished A. Patrick Beharelle 1 1 5541 3269 0.000180 0.000306
2 accomplished Aaron P. Graft 1 3 165 200 0.00606 0.015
3 accomplished Aaron P. Jagdfeld 0 1 4 NA 0 NA
Good practice is to avoid naming columns with values, e.g. 2019.... Use year
instead. Your model needs refactoring into normal form (see database normalization topic for more information).