Home > Mobile >  R: Adding characters to a changing number of columns in list of dataframes
R: Adding characters to a changing number of columns in list of dataframes

Time:11-26

My script handles data from multiple sites (one df per site), organized in a list of dataframes. For subsequent analyses, I already added a match ID from all other sites (matchID, character) to each site (dataframe). Depending on the number of sites (=dataframes) in each project (=list, data), there are a varying amount of columns named "matchID" (no duplicate names). I need to alter these strings to add the site ID (ID) to the "matchID", so the result would be "ID_matchID". Normally, I would do this via 'paste()', but since the amount of columns to add the ID to can vary depending on the number of sites in the project and as a result the number of dataframes in the list, I need to add characters to multiple strings without explicitly using the column index (I think). I'd appreciate any help, especially tips as to how to handle those NA entries.

Here is what I tried so far:

  1. WORKS - Extract all relevant columns containing matchIDs (since all dataframes are organized the exact same way (nr. of columns and positioning), I can just use [[1]]):
indx_match<-as.numeric(grep("match", colnames(data[[1]])))
  1. DOES NOT WORK - Then adding the ID to all matchID columns doesn't work, cause the second x[,indx_match] blows up everything.
result<-lapply(data,function(x){x[,indx_match]<-paste(x$ID,x[,indx_match],sep="_");x})

Here is what the data could look like (in this example only 3 sites, hence two matchID columns:

> data
[[1]]
datetime            ID    matchID_50 matchID_51
2019-03-01 07:00:00 49         NA         NA
2019-03-01 07:10:00 49         NA         NA
2019-03-01 07:20:00 49    50_09_3    51_11_6
2019-03-01 07:30:00 49    50_12_6    51_05_6
2019-03-01 07:40:00 49    50_11_4    51_11_6
[[2]]
datetime            ID    matchID_49 matchID_51
2019-03-01 07:00:00 50    49_12_6    51_11_6
2019-03-01 07:10:00 50    49_21_6    51_32_0
2019-03-01 07:20:00 50    49_09_3    51_11_3
2019-03-01 07:30:00 50    49_12_6    51_05_6
2019-03-01 07:40:00 50    49_11_4    51_13_9
[[3]]
datetime            ID    matchID_49 matchID_50
2019-03-01 07:00:00 51         NA    50_14_1
2019-03-01 07:10:00 51    49_09_3         NA
2019-03-01 07:20:00 51    49_09_3    50_14_1
2019-03-01 07:30:00 51    49_12_5    50_01_7
2019-03-01 07:40:00 51    49_08_3    50_11_8

And this is what the result would ideally look like:

> result
[[1]]
datetime            ID    matchID_50    matchID_51
2019-03-01 07:00:00 49            NA            NA
2019-03-01 07:10:00 49            NA            NA
2019-03-01 07:20:00 49    49_50_09_3    49_51_11_6
2019-03-01 07:30:00 49    49_50_12_6    49_51_05_6
2019-03-01 07:40:00 49    49_50_11_4    49_51_11_6
[[2]]
datetime            ID    matchID_49    matchID_51
2019-03-01 07:00:00 50    50_49_12_6    50_51_11_6
2019-03-01 07:10:00 50    50_49_21_6    50_51_32_0
2019-03-01 07:20:00 50    50_49_09_3    50_51_11_3
2019-03-01 07:30:00 50    50_49_12_6    50_51_05_6
2019-03-01 07:40:00 50    50_49_11_4    50_51_13_9
[[3]]
datetime            ID    matchID_49    matchID_50
2019-03-01 07:00:00 51            NA    51_50_14_1
2019-03-01 07:10:00 51    51_49_09_3            NA
2019-03-01 07:20:00 51    51_49_09_3    51_50_14_1
2019-03-01 07:30:00 51    51_49_12_5    51_50_01_7
2019-03-01 07:40:00 51    51_49_08_3    51_50_11_8

CodePudding user response:

I'm guessing you're looking for

library(dplyr)
result<-lapply(data,function(x){
  x %>% mutate_at(vars(matches("match")),function(z){ifelse(!is.na(z),paste0(.$ID,"_",z),z)})
})
  • Related