So, this is driving me crazy and I bet the answer is really simple, but I just don't get it. I have two dfs that I want to join, df1
and df2
. However, when I perform a left_join
, the values from the column Korn
does not get joined on Datum
, Soll
, Plot
, Behandlung
, Entfernung
, but instead there are NAs, even though all mentioned columns exist in both dataframes. What am I missing?
df1 <- structure(list(Soll = structure(c(8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 8L), .Label = c("1189", "119", "1192", "1202", "149",
"172", "2484", "552"), class = "factor"), Datum = structure(c(18725,
18764, 18815, 18725, 18764, 18815, 18725, 18764, 18815, 18725
), class = "Date"), Plot = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
3L, 3L, 3L, 3L), .Label = c("1", "2", "3", "4", "5", "6", "7",
"8"), class = "factor"), Behandlung = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 2L), .Label = c("a", "b"), class = "factor"),
Entfernung = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("2", "5"), class = "factor"), DGUnkraut = c(0.1,
4, 12.2, 0.7, 0, 0, 0.2, 0, 0, 0.1), prec = c(2.04, 13.38,
121.46, 2.04, 13.38, 121.46, 2.04, 13.38, 121.46, 2.04)), class = "data.frame", row.names = c(NA,
-10L))
df2 <-structure(list(Soll = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L), .Label = c(" 1189", " 119", " 1202", " 149", " 172",
" 2484", " 552"), class = "factor"), Datum = structure(c(18815,
18815, 18815, 18815, 18815, 18815, 18815, 18815, 18815, 18815
), class = "Date"), Plot = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L), .Label = c("1", "2", "3", "4", "5", "6", "7",
"8"), class = "factor"), Behandlung = structure(c(2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("a", "b"), class = "factor"),
Entfernung = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 1L), .Label = c("2", "5"), class = "factor"), Korn = c(65.6,
64, 66.4, 57.4, 72.6, 68.9, 68.1, 73.6, 59.9, 34.6)), class = "data.frame", row.names = c(NA,
-10L))
My code:
library(tidyverse)
left_join(df1, df2) -> df3
Any help is appreciated! Thanks a lot in advance! Cheers!
CodePudding user response:
As @IanNoriega and @r2evans have mentioned, there is a leading space in df2
. You can first remove that with trimws
.
df2$Soll <- trimws(df2$Soll)
Now, for the join, which works now.
dplyr::left_join(df1,
df2,
by = c("Soll", "Datum", "Plot", "Behandlung", "Entfernung"))
Output
Soll Datum Plot Behandlung Entfernung DGUnkraut prec Korn
1 552 2021-04-08 1 a 2 0.1 2.04 NA
2 552 2021-05-17 1 a 2 4.0 13.38 NA
3 552 2021-07-07 1 a 2 12.2 121.46 64.0
4 552 2021-04-08 1 b 2 0.7 2.04 NA
5 552 2021-05-17 1 b 2 0.0 13.38 NA
6 552 2021-07-07 1 b 2 0.0 121.46 65.6
7 552 2021-04-08 3 a 2 0.2 2.04 NA
8 552 2021-05-17 3 a 2 0.0 13.38 NA
9 552 2021-07-07 3 a 2 0.0 121.46 68.9
10 552 2021-04-08 3 b 2 0.1 2.04 NA
You could also do it all in one pipe.
df1 %>%
dplyr::left_join(
.,
df2 %>% dplyr::mutate(Soll = trimws(Soll)),
by = c("Soll", "Datum", "Plot", "Behandlung", "Entfernung")
)
CodePudding user response:
In your 2nd dataframe, the column Soll
has a space before the numbers (e.g., " 1189") while this is not the case in your 1st dataframe. Once these are consistent, your left join should work.
To re-generate your 2nd dataframe, see below:
df2 <-structure(list(Soll = structure(c(7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L), .Label = c("1189", "119", "1202", "149", "172",
"2484", "552"), class = "factor"), Datum = structure(c(18815,
18815, 18815, 18815, 18815, 18815, 18815, 18815, 18815, 18815
), class = "Date"), Plot = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L), .Label = c("1", "2", "3", "4", "5", "6", "7",
"8"), class = "factor"), Behandlung = structure(c(2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("a", "b"), class = "factor"),
Entfernung = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 1L), .Label = c("2", "5"), class = "factor"), Korn = c(65.6,
64, 66.4, 57.4, 72.6, 68.9, 68.1, 73.6, 59.9, 34.6)), class =
"data.frame", row.names = c(NA,
-10L))
Finally, we can try out our left join. Now we can see actual values of Korn
appended:
joined_df <- df1 %>%
left_join(df2)
Soll Datum Plot Behandlung Entfernung DGUnkraut prec Korn
1 552 2021-04-08 1 a 2 0.1 2.04 NA
2 552 2021-05-17 1 a 2 4.0 13.38 NA
3 552 2021-07-07 1 a 2 12.2 121.46 64.0
4 552 2021-04-08 1 b 2 0.7 2.04 NA
5 552 2021-05-17 1 b 2 0.0 13.38 NA
6 552 2021-07-07 1 b 2 0.0 121.46 65.6
7 552 2021-04-08 3 a 2 0.2 2.04 NA
8 552 2021-05-17 3 a 2 0.0 13.38 NA
9 552 2021-07-07 3 a 2 0.0 121.46 68.9
10 552 2021-04-08 3 b 2 0.1 2.04 NA
CodePudding user response:
Please see here dplyr::left_join produce NA values for new joined columns
To make your example work. You could:
library(dplyr)
library(stringr)
df2 <- df2 %>% as_tibble() %>%
mutate(soll = str_trim(soll))
left_join(df1, df2)
soll datum plot behandlung entfernung dg_unkraut prec korn
1 552 2021-04-08 1 a 2 0.1 2.04 NA
2 552 2021-05-17 1 a 2 4.0 13.38 NA
3 552 2021-07-07 1 a 2 12.2 121.46 64.0
4 552 2021-04-08 1 b 2 0.7 2.04 NA
5 552 2021-05-17 1 b 2 0.0 13.38 NA
6 552 2021-07-07 1 b 2 0.0 121.46 65.6
7 552 2021-04-08 3 a 2 0.2 2.04 NA
8 552 2021-05-17 3 a 2 0.0 13.38 NA
9 552 2021-07-07 3 a 2 0.0 121.46 68.9
10 552 2021-04-08 3 b 2 0.1 2.04 NA