Home > Net >  Add columns in a data frame whose column headers are unique values of another column
Add columns in a data frame whose column headers are unique values of another column

Time:10-22

I am not sure if this particular type of question has been answered before. I did a lot of research but couldn't get exactly what I wanted. I have a data set with many columns. Two of the columns are Organic = c("Yes","Yes","No",...) and StoreBrand = c("No","Yes","No",...). I want to create columns having headers Organic_Yes, Organic_No, StoreBrand_Yes and StoreBrand_No and fill them with 0 and 1. So my expected output will be

Organic Organic_Yes Organic_No StoreBrand StoreBrand_Yes StoreBrand_No
Yes 1 0 No 0 1
Yes 1 0 Yes 1 0
No 0 1 No 0 1

.....etc. I want to do this in R. Any help is appreciated

It would be great if the new column headers are not created by brute force. I am looking to automate the process of creating these columns. Thanks again!

CodePudding user response:

Here's a simple tidyverse approach:

library(dplyr)

df %>% 
  mutate(Organic_Yes = as.numeric(Organic == "Yes"),
         Organic_No = 1 - Organic_Yes,
         Storebrand_Yes = as.numeric(StoreBrand == "Yes"),
         Storebrand_No = 1 - Storebrand_Yes) %>%
  select(Organic, Organic_Yes, Organic_No,
         StoreBrand, Storebrand_Yes, Storebrand_No)

#>   Organic Organic_Yes Organic_No StoreBrand Storebrand_Yes Storebrand_No
#> 1     Yes           1          0         No              0             1
#> 2     Yes           1          0        Yes              1             0
#> 3      No           0          1         No              0             1

Or a more general tidyverse approach:

df %>% 
  mutate(across(everything(), 
                function(x) 1 * (x == "Yes"),
                .names = "{.col}_Yes")) %>%
  mutate(across(tidyselect::vars_select_helpers$where(is.character),
                function(x) 1 * (x == "No"),
                .names = "{.col}_No")) %>%
  select(order(names(.)))

#>   Organic Organic_No Organic_Yes StoreBrand StoreBrand_No StoreBrand_Yes
#> 1     Yes          0           1         No             1              0
#> 2     Yes          0           1        Yes             0              1
#> 3      No          1           0         No             1              0

Created on 2021-10-21 by the reprex package (v2.0.0)


Data

df <- data.frame(Organic    = c("Yes", "Yes", "No"),
                 StoreBrand = c("No", "Yes", "No"))

df
#>   Organic StoreBrand
#> 1     Yes         No
#> 2     Yes        Yes
#> 3      No         No

Created on 2021-10-21 by the reprex package (v2.0.0)

CodePudding user response:

Here is another approach: Using across twice with manipulating the .names argument (learned here: Using a function in .names argument of across function)

library(dplyr)
library(stringr)
df %>% 
  mutate(across(everything(), ~ifelse(. == "Yes", 1, 0), .names ="{.col}_Yes"),
         across(contains("Yes"), ~ifelse(. == 1, 0, 1), .names = "{str_replace(.col,'_Yes', '_No')}"))

output:

  Organic StoreBrand Organic_Yes StoreBrand_Yes Organic_No StoreBrand_No
1     Yes         No           1              0          0             1
2     Yes        Yes           1              1          0             0
3      No         No           0              0          1             1
  • Related