Home > Software engineering >  Problem with left_join: "Join columns must be present"
Problem with left_join: "Join columns must be present"

Time:01-10

I'm trying to summarise frequency data that occur in 'middle' positions, i.e., between the first and the last position. My approach to this task is to filter for these data, do the summarise, and then rejoin the new data with the dataframe from which they were filtered. This works well with the training data:

library(tidyverse)
df %>%
  group_by(rowid) %>%
  # summarize frequencies for middle postions:
  filter(position != first(position) & position != last(position)) %>%
  # summarise:
  summarize(across(position),
            middle_position = mean(f, na.rm = TRUE),
            word = str_c(word, collapse=" ")
            ) %>%
  left_join(df, ., by = c("rowid", "position"))

However, applied to my actual data, I get this error message:

Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
 1. ... %>% left_join(bnc_X, ., by = c("rowid", "position"))
 3. dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Error in `left_join()`:
! Join columns must be present in data.
✖ Problem with `position`.
---
Backtrace:
    ▆
 1. ├─... %>% left_join(bnc_X, ., by = c("rowid", "position"))
 2. ├─dplyr::left_join(bnc_X, ., by = c("rowid", "position"))
 3. └─dplyr:::left_join.data.frame(bnc_X, ., by = c("rowid", "position"))
 4.   └─dplyr:::join_mutate(...)
 5.     └─dplyr:::join_cols(...)
 6.       └─dplyr:::standardise_join_by(...)
 7.         └─dplyr:::check_join_vars(by$x, x_names, error_call = error_call)
 8.           └─rlang::abort(bullets, call = error_call)

The main problem seems to be the variable position- why is it not recognized? I've spent a good few hours trying to solve the issue but couldn't, and would be grateful for help!

Data:

df <- data.frame(
  size = c(3,3,3,
              3,3,3,
              4,4,4,4,
              5,5,5,5,5,
              3,3,3),
  rowid = c(1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5),
  turn = c(rep("How are you?",3),
           rep("I'm fine.",3),
           rep("How's the weather?",4),
           rep("It's really very cold.",5),
           rep("I love you",3)),
  word = c("how","are","you",
           "i","'m","fine",
           "how","'s","the","weather",
           "it","'s","really", "very","cold",
           "i","love","you"),
  f = c(400,300,250,
        600,555,1,
        400,500,700,20,
        390,500,177,200,35,
        600,199,400),
  position = c(1,2,3,
               1,2,3,
               1,2,3,4,
               1,2,3,4,5,
               1,2,3)
)

CodePudding user response:

This works for me in data.table. No joins needed.

library(data.table)
# set to data.table
setDT(df)
# get id's of middle rows by rowid
idx = df[, .(idx = .I[-c(1L, .N)]), by = .(rowid)]$idx
# update these middle rows
df[idx, `:=`(middle_position = mean(f),
             word_midddel = paste0(word, collapse = " ")),
   by = .(rowid)]

    size rowid                   turn    word   f position middle_position   word_midddel
 1:    3     1           How are you?     how 400        1              NA           <NA>
 2:    3     1           How are you?     are 300        2        300.0000            are
 3:    3     1           How are you?     you 250        3              NA           <NA>
 4:    3     2              I'm fine.       i 600        1              NA           <NA>
 5:    3     2              I'm fine.      'm 555        2        555.0000             'm
 6:    3     2              I'm fine.    fine   1        3              NA           <NA>
 7:    4     3     How's the weather?     how 400        1              NA           <NA>
 8:    4     3     How's the weather?      's 500        2        600.0000         's the
 9:    4     3     How's the weather?     the 700        3        600.0000         's the
10:    4     3     How's the weather? weather  20        4              NA           <NA>
11:    5     4 It's really very cold.      it 390        1              NA           <NA>
12:    5     4 It's really very cold.      's 500        2        292.3333 's really very
13:    5     4 It's really very cold.  really 177        3        292.3333 's really very
14:    5     4 It's really very cold.    very 200        4        292.3333 's really very
15:    5     4 It's really very cold.    cold  35        5              NA           <NA>
16:    3     5             I love you       i 600        1              NA           <NA>
17:    3     5             I love you    love 199        2        199.0000           love
18:    3     5             I love you     you 400        3              NA           <NA>
  • Related