Home > Back-end >  Filter for if rows where ID is same but Value column has different value to first occurrence
Filter for if rows where ID is same but Value column has different value to first occurrence

Time:08-09

I am looking for advice on the principle of filtering a dataset in R. I currently have the below code which allows for easy filtering of records where a value in column 'Value' is within the required list that I have created:

ValuesNumber <- 
  read.table(textConnection("CustomerID   Value   
1 Ball
1 Cat
2 Ball
2 Ball
3 Dog
4 Ball
4 Blitz"), header=TRUE)

#Filter for required values only

Values_List <- "Ball|Twist|Tester"

ValuesNumberFiltered <- ValuesNumber[grep(Values_List,  ValuesNumber$Value
),]

I am looking to amend this so that the below criteria are met:

  1. 'CustomerID' appears in the dataset at least twice
  2. The entry in 'Value' column for the second entry does not appear within a list of my choosing.

So for example if working with this dataset:

CustomerID Value
1 Ball
1 Cat
2 Ball
2 Ball
3 Dog
4 Ball
4 Blitz

I would then like to create a new column entitled 'Y/N' which has:

'1' if the value in all occurrences after the first occurrence does not match my list or '0' if it does not.

So the output would look like this:

CustomerID Value Y/N
1 Ball 0
1 Cat 1
2 Ball 0
2 Ball 0
3 Dog 0
4 Ball 0
4 Blitz 1

CodePudding user response:

tidyverse solution:

library(dplyr)

Values_List <- c("Ball", "Twist", "Tester")
ValuesNumber %>% 
  group_by(CustomerID) %>% 
  mutate(`Y/N` =  (n() >= 2 & !(Value %in% Values_List)))

  CustomerID Value `Y/N`
1          1 Ball      0
2          1 Cat       1
3          2 Ball      0
4          2 Ball      0
5          3 Dog       0
6          4 Ball      0
7          4 Blitz     1

CodePudding user response:

library(dplyr)
ValuesNumber %>%
  group_by(CustomerID) %>%
  mutate(`Y/N` = case_when(
    row_number() == 1 ~ 0,
    grepl(Values_List, Value) ~ 0,
    TRUE ~ 1
  )) %>%
  ungroup()
# # A tibble: 7 × 3
#   CustomerID Value `Y/N`
#        <int> <chr> <dbl>
# 1          1 Ball      0
# 2          1 Cat       1
# 3          2 Ball      0
# 4          2 Ball      0
# 5          3 Dog       0
# 6          4 Ball      0
# 7          4 Blitz     1

CodePudding user response:

rm(list = ls())

library(tidyverse)

values_number <- read.table(
  textConnection("CustomerID   Value   
1 Ball
1 Cat
2 Ball
2 Ball
3 Dog
4 Ball
4 Blitz"), header = TRUE)

# Filter for required values only
value_list <- c("Ball", "Twist", "Tester")

count_id <- values_number |> 
  group_by(CustomerID) |> 
  summarise(count = length(CustomerID)) |>       # count the occurance of each customer id
  right_join(values_number, by = "CustomerID") |>     # combined to the original data
  mutate("Y/N" = case_when(
    count > 1 & !(Value %in% value_list) ~ 1,    # if the occurance of customer id > 1 and
    TRUE ~ 0)                                    # the entry did not involved in the list
    )                                            # mark as 1, the others mark as 0
  
  •  Tags:  
  • r
  • Related