Home > other >  Update values based on column name across multiple dataframes (R)
Update values based on column name across multiple dataframes (R)

Time:11-18

I need to update the values in multiple columns across multiple dataframes, based on whether the column name contains a certain string.

I have a list of the dataframe columns that need to be updated, and am wondering how I can run through the list and make the updates dynamically. I have tried a 'for' loop and 'lapply' but haven't had any luck... below is sample code I have tried...

The goal is to identify whether the column name contains an 'X' or 'Y' and then to update the values in that column with either a 1 or 2.

In my real-world example, I have 60 dataframes and 100 columns, this is a simplified version - so ideally this could be a loop or leverage a function.

Any help is really appreciated!

#create columns and 2 data frames
grp = c("A","A","A","A","A","A","A")
col1 =c(10,21,33,31,423,132,245)
col2X = c(0,0,NA,1.5,1.75,1,NA)
col3 = c(10,30,NA,1.0,1.3,1.4,50)
col4 = c(0,0,NA,1.5,1.75,1,NA)
col5Y = c(1,1.25,0,1.5,0,NA,0)

df1 =  data.frame(grp,col1,col2X,col3,col4,col5Y)

first = c("A","A","A","A","A","A","A")
ind1 =c(10,21,33,31,423,132,245)
ind2X = c(0,0,NA,1.5,1.75,1,NA)
ind3 = c(10,30,NA,1.0,1.3,1.4,50)
ind4 = c(0,0,NA,1.5,1.75,1,NA)
ind5Y = c(1,1.25,0,1.5,0,NA,0)

df2 =  data.frame(first,ind1,ind2X,ind3,ind4,ind5Y)

#list of columns I want to update
lis = c('df1$col2X','df1$col5Y','df2$ind2X','df2$ind5Y')

#logical loop to identify whether column name contains 'X' or 'Y' and then update the values in that column to either a 1 or 2.
for (i in lis) {
  i <- ifelse( grepl("X", i, fixed = TRUE),
               1 , 
               2) 
  }

This is the desired changes for df1 and df2...

grp col1 col2X col3 col4 col5Y
1   A   10     1 10.0 0.00     2
2   A   21     1 30.0 0.00     2
3   A   33     1   NA   NA     2
4   A   31     1  1.0 1.50     2
5   A  423     1  1.3 1.75     2
6   A  132     1  1.4 1.00     2
7   A  245     1 50.0   NA     2

 first ind1 ind2X ind3 ind4 ind5Y
1     A   10     1 10.0 0.00     2
2     A   21     1 30.0 0.00     2
3     A   33     1   NA   NA     2
4     A   31     1  1.0 1.50     2
5     A  423     1  1.3 1.75     2
6     A  132     1  1.4 1.00     2
7     A  245     1 50.0   NA     2


CodePudding user response:

If you want to replace the values in all columns with an "X" as 1 and of "Y" as 2, you could try:

df1[, grep("X", names(df1))] <- 1
df1[, grep("Y", names(df1))] <- 2

Output:

  grp col1 col2X col3 col4 col5Y
1   A   10     1 10.0 0.00     2
2   A   21     1 30.0 0.00     2
3   A   33     1   NA   NA     2
4   A   31     1  1.0 1.50     2
5   A  423     1  1.3 1.75     2
6   A  132     1  1.4 1.00     2
7   A  245     1 50.0   NA     2

To apply this to multiple data frames, you could wrap it in a function. Then store data frames in a list and use lapply

rep_fun <- function(df){
  df[, grep("X", names(df))] <- 1
  df[, grep("Y", names(df))] <- 2
  df
}

df_list <- list(df1, df2)

lapply(df_list, function(x) rep_fun(x))

Output

> lapply(df_list, function(x) rep_fun(x))
[[1]]
  grp col1 col2X col3 col4 col5Y
1   A   10     1 10.0 0.00     2
2   A   21     1 30.0 0.00     2
3   A   33     1   NA   NA     2
4   A   31     1  1.0 1.50     2
5   A  423     1  1.3 1.75     2
6   A  132     1  1.4 1.00     2
7   A  245     1 50.0   NA     2

[[2]]
  first ind1 ind2X ind3 ind4 ind5Y
1     A   10     1 10.0 0.00     2
2     A   21     1 30.0 0.00     2
3     A   33     1   NA   NA     2
4     A   31     1  1.0 1.50     2
5     A  423     1  1.3 1.75     2
6     A  132     1  1.4 1.00     2
7     A  245     1 50.0   NA     2

If you didnt want to go the list route and wanted to use a loop:

df_names <- c("df1", "df2")

for(xx in df_names){
  assign(xx, rep_fun(get(xx)))
}
  • Related