Home > Mobile >  r dplyr::left_join doesnt match the way I want it
r dplyr::left_join doesnt match the way I want it

Time:12-18

So, this is driving me crazy and I bet the answer is really simple, but I just don't get it. I have two dfs that I want to join, df1 and df2. However, when I perform a left_join, the values from the column Korn does not get joined on Datum, Soll, Plot, Behandlung, Entfernung, but instead there are NAs, even though all mentioned columns exist in both dataframes. What am I missing?

df1 <- structure(list(Soll = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L), .Label = c("1189", "119", "1192", "1202", "149", 
"172", "2484", "552"), class = "factor"), Datum = structure(c(18725, 
18764, 18815, 18725, 18764, 18815, 18725, 18764, 18815, 18725
), class = "Date"), Plot = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
3L, 3L, 3L, 3L), .Label = c("1", "2", "3", "4", "5", "6", "7", 
"8"), class = "factor"), Behandlung = structure(c(1L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 1L, 2L), .Label = c("a", "b"), class = "factor"), 
    Entfernung = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L), .Label = c("2", "5"), class = "factor"), DGUnkraut = c(0.1, 
    4, 12.2, 0.7, 0, 0, 0.2, 0, 0, 0.1), prec = c(2.04, 13.38, 
    121.46, 2.04, 13.38, 121.46, 2.04, 13.38, 121.46, 2.04)), class = "data.frame", row.names = c(NA, 
-10L))
df2 <-structure(list(Soll = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L), .Label = c(" 1189", " 119", " 1202", " 149", " 172", 
" 2484", " 552"), class = "factor"), Datum = structure(c(18815, 
18815, 18815, 18815, 18815, 18815, 18815, 18815, 18815, 18815
), class = "Date"), Plot = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 
4L, 4L, 5L, 5L), .Label = c("1", "2", "3", "4", "5", "6", "7", 
"8"), class = "factor"), Behandlung = structure(c(2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("a", "b"), class = "factor"), 
    Entfernung = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 
    1L, 1L), .Label = c("2", "5"), class = "factor"), Korn = c(65.6, 
    64, 66.4, 57.4, 72.6, 68.9, 68.1, 73.6, 59.9, 34.6)), class = "data.frame", row.names = c(NA, 
-10L))

My code:

library(tidyverse)

left_join(df1, df2) -> df3

Any help is appreciated! Thanks a lot in advance! Cheers!

CodePudding user response:

As @IanNoriega and @r2evans have mentioned, there is a leading space in df2. You can first remove that with trimws.

df2$Soll <- trimws(df2$Soll)

Now, for the join, which works now.

dplyr::left_join(df1,
          df2,
          by = c("Soll", "Datum", "Plot", "Behandlung", "Entfernung"))

Output

   Soll      Datum Plot Behandlung Entfernung DGUnkraut   prec Korn
1   552 2021-04-08    1          a          2       0.1   2.04   NA
2   552 2021-05-17    1          a          2       4.0  13.38   NA
3   552 2021-07-07    1          a          2      12.2 121.46 64.0
4   552 2021-04-08    1          b          2       0.7   2.04   NA
5   552 2021-05-17    1          b          2       0.0  13.38   NA
6   552 2021-07-07    1          b          2       0.0 121.46 65.6
7   552 2021-04-08    3          a          2       0.2   2.04   NA
8   552 2021-05-17    3          a          2       0.0  13.38   NA
9   552 2021-07-07    3          a          2       0.0 121.46 68.9
10  552 2021-04-08    3          b          2       0.1   2.04   NA

You could also do it all in one pipe.

df1 %>%
  dplyr::left_join(
    .,
    df2 %>% dplyr::mutate(Soll = trimws(Soll)),
    by = c("Soll", "Datum", "Plot", "Behandlung", "Entfernung")
  )

CodePudding user response:

In your 2nd dataframe, the column Soll has a space before the numbers (e.g., " 1189") while this is not the case in your 1st dataframe. Once these are consistent, your left join should work.

To re-generate your 2nd dataframe, see below:

df2 <-structure(list(Soll = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L), .Label = c("1189", "119", "1202", "149", "172", 
"2484", "552"), class = "factor"), Datum = structure(c(18815, 
18815, 18815, 18815, 18815, 18815, 18815, 18815, 18815, 18815
), class = "Date"), Plot = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 
4L, 4L, 5L, 5L), .Label = c("1", "2", "3", "4", "5", "6", "7", 
"8"), class = "factor"), Behandlung = structure(c(2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("a", "b"), class = "factor"), 
    Entfernung = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 
    1L, 1L), .Label = c("2", "5"), class = "factor"), Korn = c(65.6, 
    64, 66.4, 57.4, 72.6, 68.9, 68.1, 73.6, 59.9, 34.6)), class = 
"data.frame", row.names = c(NA, 
-10L))

Finally, we can try out our left join. Now we can see actual values of Korn appended:

joined_df <- df1 %>%
  left_join(df2)

   Soll      Datum Plot Behandlung Entfernung DGUnkraut   prec Korn
1   552 2021-04-08    1          a          2       0.1   2.04   NA
2   552 2021-05-17    1          a          2       4.0  13.38   NA
3   552 2021-07-07    1          a          2      12.2 121.46 64.0
4   552 2021-04-08    1          b          2       0.7   2.04   NA
5   552 2021-05-17    1          b          2       0.0  13.38   NA
6   552 2021-07-07    1          b          2       0.0 121.46 65.6
7   552 2021-04-08    3          a          2       0.2   2.04   NA
8   552 2021-05-17    3          a          2       0.0  13.38   NA
9   552 2021-07-07    3          a          2       0.0 121.46 68.9
10  552 2021-04-08    3          b          2       0.1   2.04   NA

CodePudding user response:

Please see here dplyr::left_join produce NA values for new joined columns

To make your example work. You could:

library(dplyr)
library(stringr)

df2 <- df2 %>%  as_tibble() %>% 
  mutate(soll = str_trim(soll))

left_join(df1, df2)
   soll      datum plot behandlung entfernung dg_unkraut   prec korn
1   552 2021-04-08    1          a          2        0.1   2.04   NA
2   552 2021-05-17    1          a          2        4.0  13.38   NA
3   552 2021-07-07    1          a          2       12.2 121.46 64.0
4   552 2021-04-08    1          b          2        0.7   2.04   NA
5   552 2021-05-17    1          b          2        0.0  13.38   NA
6   552 2021-07-07    1          b          2        0.0 121.46 65.6
7   552 2021-04-08    3          a          2        0.2   2.04   NA
8   552 2021-05-17    3          a          2        0.0  13.38   NA
9   552 2021-07-07    3          a          2        0.0 121.46 68.9
10  552 2021-04-08    3          b          2        0.1   2.04   NA
  • Related