Home > Software engineering >  How to replace certain rows in a dataframe based off of values in another dataframe while preserving
How to replace certain rows in a dataframe based off of values in another dataframe while preserving

Time:11-25

I have a dataframe that looks like this:

df1 <- data.frame(
  Group = c('A', 'B', 'Total: AB', 'C', 'D', 'Total: CD'),
  Value1 = c(12, 88, 100, 76, 23, 99),
  Value2 = c(10, 82, 92, 71, 21, 92)
)

And another that looks like this:

df2 <- data.frame(
  Group = c('A', 'B', 'Total: AB', 'C', 'D', 'Total: CD'),
  Value1 = c(435, 568, 1003, 709, 183, 727),
  Value2 = c(180, 370, 550, 199, 283, 482)
)

I want to replace the Total rows in df1 with their values in df2 for a result like this:

df_desired <- data.frame(
  Group = c('A', 'B', 'Total: AB', 'C', 'D', 'Total: CD'),
  Value1 = c(12, 88, 1003, 76, 23, 727),
  Value2 = c(10, 82, 550, 71, 21, 482)
)

What's the best way to do this while preserving row order using R, ideally dplyr?

CodePudding user response:

If the datasets are in order and have the same dimensions, find the locations of 'Total' substring from the 'Group' column in 'df1' and use that as row index to replace the values other than the first column in 'df1' with the corresponding values from 'df2'

df1 <- type.convert(df1, as.is = TRUE)
df2 <- type.convert(df2, as.is = TRUE)
i1 <- grep("Total", df1$Group)
df1[i1, -1] <- df2[i1, -1]

-checking

> identical(df1, df_desired)
[1] TRUE

Or using dplyr

library(dplyr)
library(stringr)
df3 <- df1 %>%
    mutate(across(starts_with("Value"), ~ 
      case_when(str_detect(Group, 'Total') ~ df2[[cur_column()]], 
      TRUE ~ .x)))

-output

df3
      Group Value1 Value2
1         A     12     10
2         B     88     82
3 Total: AB   1003    550
4         C     76     71
5         D     23     21
6 Total: CD    727    482
  • Related