My script handles data from multiple sites (one df per site), organized in a list of dataframes. For subsequent analyses, I already added a match ID from all other sites (matchID
, character) to each site (dataframe). Depending on the number of sites (=dataframes) in each project (=list, data
), there are a varying amount of columns named "matchID" (no duplicate names). I need to alter these strings to add the site ID (ID
) to the "matchID", so the result would be "ID_matchID". Normally, I would do this via 'paste()', but since the amount of columns to add the ID to can vary depending on the number of sites in the project and as a result the number of dataframes in the list, I need to add characters to multiple strings without explicitly using the column index (I think). I'd appreciate any help, especially tips as to how to handle those NA
entries.
Here is what I tried so far:
- WORKS - Extract all relevant columns containing matchIDs (since all dataframes are organized the exact same way (nr. of columns and positioning), I can just use [[1]]):
indx_match<-as.numeric(grep("match", colnames(data[[1]])))
- DOES NOT WORK - Then adding the ID to all matchID columns doesn't work, cause the second x[,indx_match] blows up everything.
result<-lapply(data,function(x){x[,indx_match]<-paste(x$ID,x[,indx_match],sep="_");x})
Here is what the data could look like (in this example only 3 sites, hence two matchID columns:
> data
[[1]]
datetime ID matchID_50 matchID_51
2019-03-01 07:00:00 49 NA NA
2019-03-01 07:10:00 49 NA NA
2019-03-01 07:20:00 49 50_09_3 51_11_6
2019-03-01 07:30:00 49 50_12_6 51_05_6
2019-03-01 07:40:00 49 50_11_4 51_11_6
[[2]]
datetime ID matchID_49 matchID_51
2019-03-01 07:00:00 50 49_12_6 51_11_6
2019-03-01 07:10:00 50 49_21_6 51_32_0
2019-03-01 07:20:00 50 49_09_3 51_11_3
2019-03-01 07:30:00 50 49_12_6 51_05_6
2019-03-01 07:40:00 50 49_11_4 51_13_9
[[3]]
datetime ID matchID_49 matchID_50
2019-03-01 07:00:00 51 NA 50_14_1
2019-03-01 07:10:00 51 49_09_3 NA
2019-03-01 07:20:00 51 49_09_3 50_14_1
2019-03-01 07:30:00 51 49_12_5 50_01_7
2019-03-01 07:40:00 51 49_08_3 50_11_8
And this is what the result would ideally look like:
> result
[[1]]
datetime ID matchID_50 matchID_51
2019-03-01 07:00:00 49 NA NA
2019-03-01 07:10:00 49 NA NA
2019-03-01 07:20:00 49 49_50_09_3 49_51_11_6
2019-03-01 07:30:00 49 49_50_12_6 49_51_05_6
2019-03-01 07:40:00 49 49_50_11_4 49_51_11_6
[[2]]
datetime ID matchID_49 matchID_51
2019-03-01 07:00:00 50 50_49_12_6 50_51_11_6
2019-03-01 07:10:00 50 50_49_21_6 50_51_32_0
2019-03-01 07:20:00 50 50_49_09_3 50_51_11_3
2019-03-01 07:30:00 50 50_49_12_6 50_51_05_6
2019-03-01 07:40:00 50 50_49_11_4 50_51_13_9
[[3]]
datetime ID matchID_49 matchID_50
2019-03-01 07:00:00 51 NA 51_50_14_1
2019-03-01 07:10:00 51 51_49_09_3 NA
2019-03-01 07:20:00 51 51_49_09_3 51_50_14_1
2019-03-01 07:30:00 51 51_49_12_5 51_50_01_7
2019-03-01 07:40:00 51 51_49_08_3 51_50_11_8
CodePudding user response:
I'm guessing you're looking for
library(dplyr)
result<-lapply(data,function(x){
x %>% mutate_at(vars(matches("match")),function(z){ifelse(!is.na(z),paste0(.$ID,"_",z),z)})
})