Home > Software engineering >  replacing values from one column with values from another based on condition
replacing values from one column with values from another based on condition

Time:05-02

I have a dataset that looks like this:

setseed(123)

A<-c(rep(1:12,2))
B<-c(rep("c",12),rep("d",12))
AB<-paste0(b,a)
C<-runif(24,1,2)
D<-c("c1","c2","c3","c4_6", "c4_6","c4_6","c7","c8","c9","c10","c11","c12", "d1_3","d1_3","d1_3","d4","d5","d6","d7","d8","d9","d10","d11","d12")
E<-rbinom(24,1,0.5)

df<-cbind.data.frame(ab, c, d, e)

I need to replace values in column C with values from column E if the values in column D are written as a series of strings, e.g., "c4_6", "c4_6", "c4_6". Strings from columns AB and D match and are in the same order, just the ones from column D are written as a series with a starting point and the end point. The biggest problem for me is matching series of strings from column D with single strings from column AB, that I don't even know where to start looking for the answer.

The expected result should look like this:

  AB        C    D E
1 c1 1.863559   c1 1
2 c2 1.169298   c2 0
3 c3 1.259018   c3 0
4 c4 0.000000 c4_6 0
5 c5 1.000000 c4_6 1
6 c6 1.000000 c4_6 1
7 c7 1.748054   c7 0
8 c8 1.329033   c8 0
9 c9 1.339548   c9 0

CodePudding user response:

The term "serie of strings" is a bit unclear, but it seems that you need to use a regular expression.

Here is the code:

library(tidyverse)
df %>% 
  mutate(C = ifelse(str_detect(D, "c\\d_\\d"), E, C)) %>% 
  head(9)
#>   AB        C    D E
#> 1 c1 1.287578   c1 1
#> 2 c2 1.788305   c2 1
#> 3 c3 1.408977   c3 1
#> 4 c4 1.000000 c4_6 1
#> 5 c5 0.000000 c4_6 0
#> 6 c6 0.000000 c4_6 0
#> 7 c7 1.528105   c7 1
#> 8 c8 1.892419   c8 1
#> 9 c9 1.551435   c9 1

Created on 2022-05-02 by the reprex package (v2.0.1)

Here, I'm using stringr::str_detect() to test if the strings in D match the following sequence: a c, a digit, an underscore, another digit.

If your condition is rather D starts with AB and goes on, you can use paste inside str_detect(), like C = ifelse(str_detect(D, paste0(AB, ". ")), E, C).

However, in your example, a simple C = ifelse(D!=AB, E, C) would be enough.

There are a lot of tutorials about regex out there, but the vignette of string is a good start: https://cran.r-project.org/web/packages/stringr/vignettes/regular-expressions.html

Also, I advise to use https://regex101.com/ when testing your regex.

  • Related