Home > Blockchain >  How to create new dataframe column with a character that comes from three other columns but only if
How to create new dataframe column with a character that comes from three other columns but only if

Time:02-05

I've been trying to do this with mutate(), str_detect(), but it's too complex and dont know how to do all the steps, please help!

I have a dataframe in which 3 cols contain let's say fruits, animals, or "none".

col1 col2 col3
apple cat none
apple dog none
pear none none
pear apple none
none none none

And then I have two lists:

fruit <- c("apple", "pear", banana") animal <- c("cat", "dog", "sheep")

I want to create two new columns in the dataframe: col4 should display only fruits from col1, col2, col3. If more than one fruit, I need them separated by commas. col5 does the same but for animals. If col1, col2, col3 dont contain an animal or a fruit, I need col4 and col5 to say "none".

col1 col2 col3 col4 col5
apple cat none apple cat
cat dog none none cat, dog
pear none none pear none
pear apple none pear, apple none
none none none none none

CodePudding user response:

This gives you what you want:

df <- data.frame(col1=c("apple", "cat", "pear", "pear","none"),
                 col2=c("cat", "dog", "none", "apple", "none"),
                 col3 = c("none", "none", "none", "none", "none"))

fruit <- c("apple", "pear", "banana")
animal <- c("cat", "dog", "sheep")

df %>% mutate(col4 = apply(.,1, function(x) paste0(x[x != "none" & x %in% fruit], collapse = ",")),
              col5 = apply(.,1, function(x) paste0(x[x != "none" & x %in% animal], collapse = ",")),
              across(.cols = c("col4", "col5"), .fns = ~replace(.x, .x == "", "none")))

output:

  col1  col2 col3       col4    col5
1 apple   cat none      apple     cat
2   cat   dog none       none cat,dog
3  pear  none none       pear    none
4  pear apple none pear,apple    none
5  none  none none       none    none

``

CodePudding user response:

Please check the below code with ifelse,
to make the code work we need to update the vectors fruit and animal with none

library(tidyverse)

fruit <- c("apple", "pear", "banana",'none') 
animal <- c("cat", "dog", "sheep",'none')

df %>% 
mutate(col4=ifelse(!(col2 %in% animal), paste0(col1,',',col2), ifelse(col1 %in% fruit, col1, 'none')),
       col5=ifelse(!(col1 %in% fruit), paste0(col1,',',col2), ifelse(col2 %in% animal, col2, 'none'))
                          )

Created on 2023-02-04 with reprex v2.0.2

   col1  col2 col3       col4    col5
1 apple   cat none      apple     cat
2   cat   dog none       none cat,dog
3  pear  none none       pear    none
4  pear apple none pear,apple    none
5  none  none none       none    none

CodePudding user response:

Here is one method

library(dplyr)
library(purrr)
imap_dfc(list(col4 = fruit, col5 = animal), \(x, y) 
    df1 %>% 
     mutate(across(everything(), ~ replace(.x, !.x %in% x, NA_character_))) %>%
     unite(coln, everything(), sep = ", ", na.rm = TRUE) %>%
     setNames(y)) %>%
     mutate(across(everything(), 
       ~ case_when(.x == "" ~ "none", TRUE ~ .x))) %>%     
    bind_cols(df1, .)

-output

  col1  col2 col3        col4     col5
1 apple   cat none       apple      cat
2   cat   dog none             cat, dog
3  pear  none none        pear         
4  pear apple none pear, apple         
5  none  none none            

Or using pmap

library(tidyr)
df1 %>%
    mutate(col = pmap(across(everything()), ~ {
    v1 <- c(...)
   tibble(`4` = toString(v1[v1 %in% fruit]), 
        `5` = toString(v1[v1 %in% animal]))})) %>%
   unnest_wider(col, names_sep = "") %>%
   mutate(across(everything(), ~ case_when(.x == "" ~ "none", TRUE ~ .x)))

-output

# A tibble: 5 × 5
  col1  col2  col3  col4        col5    
  <chr> <chr> <chr> <chr>       <chr>   
1 apple cat   none  apple       cat     
2 cat   dog   none  none        cat, dog
3 pear  none  none  pear        none    
4 pear  apple none  pear, apple none    
5 none  none  none  none        none    

data

df1 <- structure(list(col1 = c("apple", "cat", "pear", "pear", "none"
), col2 = c("cat", "dog", "none", "apple", "none"), col3 = c("none", 
"none", "none", "none", "none")), class = "data.frame", 
row.names = c(NA, 
-5L))

CodePudding user response:

You can also do it by creating base R function. For col4 you can write:

fruit <- c("apple", "pear", "banana") ; animal <- c("cat", "dog", "sheep")

df$col4 <- apply(df, 1, function(x) {
fruits <- x[x %in% fruit]
if (length(fruits) > 0) {
    paste(fruits, collapse=", ")
} else {
    "none"
  }
})

and for col5 the function is:

df$col5 <- apply(df, 1, function(x) {
animals <- x[x %in% animal]
if (length(animals) > 0) {
paste(animals, collapse=", ")
} else {
"none"
 }
})
  • Related