I have two dataframes. One is quality with different dates throughout the year of collection points at station st
. The other is the land use of each st
season for that year.
Example
library(dplyr)
water_df=read.table(text="st Date OD pH DBO A 01/07/2005 8 6.3 3 A 02/06/2005 7 6.2 2.2 A 01/01/2005 7.3 6.5 3.1 A 03/05/2006 6 6.3 4 A 09/08/2006 6.8 7.1 1.1 A 12/12/2006 7.3 8.1 2.9 B 02/07/2005 6.8 5.4 2.6 B 03/06/2005 6.0 5.3 1.9 B 02/01/2005 6.2 5.5 2.6 B 04/05/2006 5.1 5.4 3.4 B 10/08/2006 5.8 6.0 0.9 B 13/12/2006 6.2 6.9 2.5 C 20/12/2006 6.5 7.2 2.6 C 27/12/2006 6.8 7.6 2.7 C 03/01/2007 7.2 8.0 2.9 C 10/01/2007 7.5 8.4 3.0 C 17/01/2007 7.9 8.8 3.1 C 24/01/2007 8.3 9.2 3.3 C 31/01/2007 8.7 9.7 3.5 C 07/02/2007 9.2 10.2 3.6", sep="", header=TRUE)%>%as.data.frame() land_df=read.table(text = "st year Veg Water Soil Crop Grass A 2005 100 200 80 130 70 B 2006 98 180 84 132 86 C 2007 93 175 79 127 106", sep="", header = TRUE)%>%as.data.frame()
I would like to add to the quality data.frame the land use values of that station even if for the same station st
is repeated.
I tried some things but it didn't work
#I tryed cbind
join_df<-cbind(water_df, land_df)
#I tryed
library(purrr)
joind_df2<-purrr::reduce(water_df, land_df)
CodePudding user response:
This operation it commonly called a "join" or "merge". You need columns to join on, which means we need to extract the year from your date, and then it's a left_join
command. See this FAQ for more information about joining data in R.
library(dplyr)
library(lubridate)
water_df %>%
mutate(year = year(dmy(Date))) %>%
left_join(land_df, by = c("st", "year"))
# st Date OD pH DBO year Veg Water Soil Crop Grass
# 1 A 01/07/2005 8.0 6.3 3.0 2005 100 200 80 130 70
# 2 A 02/06/2005 7.0 6.2 2.2 2005 100 200 80 130 70
# 3 A 01/01/2005 7.3 6.5 3.1 2005 100 200 80 130 70
# 4 A 03/05/2006 6.0 6.3 4.0 2006 NA NA NA NA NA
# 5 A 09/08/2006 6.8 7.1 1.1 2006 NA NA NA NA NA
# 6 A 12/12/2006 7.3 8.1 2.9 2006 NA NA NA NA NA
# 7 B 02/07/2005 6.8 5.4 2.6 2005 NA NA NA NA NA
# 8 B 03/06/2005 6.0 5.3 1.9 2005 NA NA NA NA NA
# 9 B 02/01/2005 6.2 5.5 2.6 2005 NA NA NA NA NA
# 10 B 04/05/2006 5.1 5.4 3.4 2006 98 180 84 132 86
# 11 B 10/08/2006 5.8 6.0 0.9 2006 98 180 84 132 86
# 12 B 13/12/2006 6.2 6.9 2.5 2006 98 180 84 132 86
# 13 C 20/12/2006 6.5 7.2 2.6 2006 NA NA NA NA NA
# 14 C 27/12/2006 6.8 7.6 2.7 2006 NA NA NA NA NA
# 15 C 03/01/2007 7.2 8.0 2.9 2007 93 175 79 127 106
# 16 C 10/01/2007 7.5 8.4 3.0 2007 93 175 79 127 106
# 17 C 17/01/2007 7.9 8.8 3.1 2007 93 175 79 127 106
# 18 C 24/01/2007 8.3 9.2 3.3 2007 93 175 79 127 106
# 19 C 31/01/2007 8.7 9.7 3.5 2007 93 175 79 127 106
# 20 C 07/02/2007 9.2 10.2 3.6 2007 93 175 79 127 106
There's some missing values where your land_df
didn't have observations for a particular station in a particular year. Have a look at ?tidyr::fill
and this FAQ if you want to fill those in with, e.g., the previous observation.