Home > Enterprise >  Create column conditioning the behavior of rows in the dataset
Create column conditioning the behavior of rows in the dataset

Time:02-08

I would like to do something very specific. I have a vast set of data, which, in summary, looks more or less like this, with values 0, 1 and 2:

enter image description here

I need to create a situation variable so that it contains the value 0, 1 and 2.

The value 0 for cases that contain only 0's and 1's in the entire line.

The value 1 for the case where the value 2 appears, but at some point 1 appears before it.

The value 2 for the case where the value 2 appears, but at some point 0 appears before it.

So it's something close to:

enter image description here

structure(list(X1 = c(1, 1, 1, 1, 1, 1, 1, 1, 0, 1), X2 = c(1, 
1, 1, 1, 0, 0, 0, 0, 0, 2), X3 = c(0, 1, 1, 1, 1, 0, 0, 1, 0, 
0), X4 = c(0, 1, 1, 0, 1, 1, 0, 0, 0, 0), X5 = c(2, 1, 1, 0, 
2, 1, 1, 0, 0, 0), X6 = c(2, 1, 1, 0, 2, 1, 1, 0, 0, 0), X7 = c(2, 
1, 1, 1, 2, 1, 1, 2, 0, 0), X8 = c(0, 1, 1, 1, 2, 1, 2, 2, 2, 
0)), class = "data.frame", row.names = c(NA, 10L))

CodePudding user response:

I wrote a score function and applied it over all the rows of your dataframe.

score <- function(x) {
  a <- which(x == 2)
  ifelse(length(a) > 0, ifelse(a[1] >=2, 2 - x[a[1] - 1], 1), 0)
}

df <- structure(list(X1 = c(1, 1, 1, 1, 1, 1, 1, 1, 0, 1), 
                     X2 = c(1, 1, 1, 1, 0, 0, 0, 0, 0, 2), 
                     X3 = c(0, 1, 1, 1, 1, 0, 0, 1, 0, 0), 
                     X4 = c(0, 1, 1, 0, 1, 1, 0, 0, 0, 0), 
                     X5 = c(2, 1, 1, 0, 2, 1, 1, 0, 0, 0), 
                     X6 = c(2, 1, 1, 0, 2, 1, 1, 0, 0, 0), 
                     X7 = c(2, 1, 1, 1, 2, 1, 1, 2, 0, 0), 
                     X8 = c(0, 1, 1, 1, 2, 1, 2, 2, 2, 0)),
                class = "data.frame", row.names = c(NA, 10L))
df$situation <- sapply(1:nrow(df), function(i) score(as.numeric(df[i,])))
df

CodePudding user response:

Here's a tidyverse approach.

I'll first concatenate all columns together, then use grepl() to look for 12 or 02.

library(tidyverse)

df %>% rowwise() %>% 
  mutate(concat = paste(c_across(everything()), collapse = "")) %>% 
  ungroup() %>%
  mutate(situation = case_when(
    !grepl(2, concat) ~ 0,
    grepl("12", concat) ~ 1,
    grepl("02", concat) ~ 2
  )) %>% 
  select(-concat)

Output

# A tibble: 10 x 9
      X1    X2    X3    X4    X5    X6    X7    X8 situation
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
 1     1     1     0     0     2     2     2     0         2
 2     1     1     1     1     1     1     1     1         0
 3     1     1     1     1     1     1     1     1         0
 4     1     1     1     0     0     0     1     1         0
 5     1     0     1     1     2     2     2     2         1
 6     1     0     0     1     1     1     1     1         0
 7     1     0     0     0     1     1     1     2         1
 8     1     0     1     0     0     0     2     2         2
 9     0     0     0     0     0     0     0     2         2
10     1     2     0     0     0     0     0     0         1

Note that this solution assumes that:

  1. 2 will not appear in the first column
  2. 1 or 2 in the situation is defined by the number immediately before 2 in your dataset
  3. There will not be a case of 12 and 02 happening in the same row
  •  Tags:  
  • Related