Home > Net >  How to create new data frame based on conditions
How to create new data frame based on conditions

Time:07-28

Need your suggestion, I have below table generated dynamically based on monthly data.

enter image description here

Need to compare each row across column and create a new data frame based on priority set as Fish > dog > horse > cat such that the sample output looks like,

enter image description here

df <- data.frame(Col_1 = c('cat', 'dog', 'cat'),
                 Col_2 = c('dog', 'horse', 'fish'),
                 Col_3 = c('horse', 'cat', NA),
                 Col_4 = c('fish', 'horse', NA))

CodePudding user response:

You can make your data into a long dataset with row numbers. Then you can arrange it as per your preference after grouping based on the row numbers. Then you can use slice to get the desired data frame.

library(tidyverse)

tribble(~col1, ~col2, ~col3, ~col4,
        "cat", "dog", "horse", "fish",
        "dog", "horse", "cat", "horse",
        "cat", "fish", "NA", "NA") -> df


df %>% 
  mutate(rn = row_number()) %>% 
  pivot_longer(-rn) %>% 
  drop_na() %>% 
  group_by(rn) %>% 
  arrange(match(value, c("fish", "dog", "horse", "cat")), .by_group = T) %>% 
  slice(1) %>% 
  ungroup() %>% 
  select(value)

# A tibble: 3 × 1
  value
  <chr>
1 fish 
2 dog  
3 fish 

CodePudding user response:

Throwing my solution to the ring

library(dplyr)

df %>%
rowwise() %>%
  mutate(
    priority = intersect(c('fish', 'dog', 'horse', 'cat'), c(X1, X2, X3, X4))[1]
  ) %>%
  ungroup()

#> # A tibble: 3 × 5
#>   X1    X2    X3    X4    priority
#>   <chr> <chr> <chr> <chr> <chr>   
#> 1 cat   dog   horse fish  fish    
#> 2 dog   horse cat   horse dog     
#> 3 fish  fish  <NA>  <NA>  fish

Created on 2022-07-27 by the reprex package (v2.0.1)

Convert the desired rows to vectors, and then take the first item from the intersect of the priority vector. If you only want the new column change mutate to transmute (or add ,.keep = 'none' to the mutate)

CodePudding user response:

You could define a custom order for character strings by ordered(...) or factor(..., ordered = TRUE). In this way you could use sort, min/max etc. to compare their priorities.

do.call(pmax,
  c(lapply(df, ordered, levels = c('cat', 'horse', 'dog', 'fish')), na.rm = TRUE)
)

# [1] fish dog  fish
# Levels: cat < horse < dog < fish
Data
df <- data.frame(Col_1 = c('cat', 'dog', 'cat'),
                 Col_2 = c('dog', 'horse', 'fish'),
                 Col_3 = c('horse', 'cat', NA),
                 Col_4 = c('fish', 'horse', NA))

CodePudding user response:

Another approach with base R with match and max.col:

prior_vec <- c('cat', 'horse', 'dog', 'fish')

s <- sapply(df, match, table = prior_vec, nomatch = 0)

df[cbind(1:nrow(df), max.col(s))]

which gives:

# [1] "fish" "dog"  "fish"
  • Related