I need to update the values in multiple columns across multiple dataframes, based on whether the column name contains a certain string.
I have a list of the dataframe columns that need to be updated, and am wondering how I can run through the list and make the updates dynamically. I have tried a 'for' loop and 'lapply' but haven't had any luck... below is sample code I have tried...
The goal is to identify whether the column name contains an 'X' or 'Y' and then to update the values in that column with either a 1 or 2.
In my real-world example, I have 60 dataframes and 100 columns, this is a simplified version - so ideally this could be a loop or leverage a function.
Any help is really appreciated!
#create columns and 2 data frames
grp = c("A","A","A","A","A","A","A")
col1 =c(10,21,33,31,423,132,245)
col2X = c(0,0,NA,1.5,1.75,1,NA)
col3 = c(10,30,NA,1.0,1.3,1.4,50)
col4 = c(0,0,NA,1.5,1.75,1,NA)
col5Y = c(1,1.25,0,1.5,0,NA,0)
df1 = data.frame(grp,col1,col2X,col3,col4,col5Y)
first = c("A","A","A","A","A","A","A")
ind1 =c(10,21,33,31,423,132,245)
ind2X = c(0,0,NA,1.5,1.75,1,NA)
ind3 = c(10,30,NA,1.0,1.3,1.4,50)
ind4 = c(0,0,NA,1.5,1.75,1,NA)
ind5Y = c(1,1.25,0,1.5,0,NA,0)
df2 = data.frame(first,ind1,ind2X,ind3,ind4,ind5Y)
#list of columns I want to update
lis = c('df1$col2X','df1$col5Y','df2$ind2X','df2$ind5Y')
#logical loop to identify whether column name contains 'X' or 'Y' and then update the values in that column to either a 1 or 2.
for (i in lis) {
i <- ifelse( grepl("X", i, fixed = TRUE),
1 ,
2)
}
This is the desired changes for df1 and df2...
grp col1 col2X col3 col4 col5Y
1 A 10 1 10.0 0.00 2
2 A 21 1 30.0 0.00 2
3 A 33 1 NA NA 2
4 A 31 1 1.0 1.50 2
5 A 423 1 1.3 1.75 2
6 A 132 1 1.4 1.00 2
7 A 245 1 50.0 NA 2
first ind1 ind2X ind3 ind4 ind5Y
1 A 10 1 10.0 0.00 2
2 A 21 1 30.0 0.00 2
3 A 33 1 NA NA 2
4 A 31 1 1.0 1.50 2
5 A 423 1 1.3 1.75 2
6 A 132 1 1.4 1.00 2
7 A 245 1 50.0 NA 2
CodePudding user response:
If you want to replace the values in all columns with an "X" as 1 and of "Y" as 2, you could try:
df1[, grep("X", names(df1))] <- 1
df1[, grep("Y", names(df1))] <- 2
Output:
grp col1 col2X col3 col4 col5Y
1 A 10 1 10.0 0.00 2
2 A 21 1 30.0 0.00 2
3 A 33 1 NA NA 2
4 A 31 1 1.0 1.50 2
5 A 423 1 1.3 1.75 2
6 A 132 1 1.4 1.00 2
7 A 245 1 50.0 NA 2
To apply this to multiple data frames, you could wrap it in a function. Then store data frames in a list and use lapply
rep_fun <- function(df){
df[, grep("X", names(df))] <- 1
df[, grep("Y", names(df))] <- 2
df
}
df_list <- list(df1, df2)
lapply(df_list, function(x) rep_fun(x))
Output
> lapply(df_list, function(x) rep_fun(x))
[[1]]
grp col1 col2X col3 col4 col5Y
1 A 10 1 10.0 0.00 2
2 A 21 1 30.0 0.00 2
3 A 33 1 NA NA 2
4 A 31 1 1.0 1.50 2
5 A 423 1 1.3 1.75 2
6 A 132 1 1.4 1.00 2
7 A 245 1 50.0 NA 2
[[2]]
first ind1 ind2X ind3 ind4 ind5Y
1 A 10 1 10.0 0.00 2
2 A 21 1 30.0 0.00 2
3 A 33 1 NA NA 2
4 A 31 1 1.0 1.50 2
5 A 423 1 1.3 1.75 2
6 A 132 1 1.4 1.00 2
7 A 245 1 50.0 NA 2
If you didnt want to go the list route and wanted to use a loop:
df_names <- c("df1", "df2")
for(xx in df_names){
assign(xx, rep_fun(get(xx)))
}