I am not sure if this particular type of question has been answered before. I did a lot of research but couldn't get exactly what I wanted. I have a data set with many columns. Two of the columns are Organic = c("Yes","Yes","No",...) and StoreBrand = c("No","Yes","No",...). I want to create columns having headers Organic_Yes, Organic_No, StoreBrand_Yes and StoreBrand_No and fill them with 0 and 1. So my expected output will be
Organic | Organic_Yes | Organic_No | StoreBrand | StoreBrand_Yes | StoreBrand_No |
---|---|---|---|---|---|
Yes | 1 | 0 | No | 0 | 1 |
Yes | 1 | 0 | Yes | 1 | 0 |
No | 0 | 1 | No | 0 | 1 |
.....etc. I want to do this in R. Any help is appreciated
It would be great if the new column headers are not created by brute force. I am looking to automate the process of creating these columns. Thanks again!
CodePudding user response:
Here's a simple tidyverse approach:
library(dplyr)
df %>%
mutate(Organic_Yes = as.numeric(Organic == "Yes"),
Organic_No = 1 - Organic_Yes,
Storebrand_Yes = as.numeric(StoreBrand == "Yes"),
Storebrand_No = 1 - Storebrand_Yes) %>%
select(Organic, Organic_Yes, Organic_No,
StoreBrand, Storebrand_Yes, Storebrand_No)
#> Organic Organic_Yes Organic_No StoreBrand Storebrand_Yes Storebrand_No
#> 1 Yes 1 0 No 0 1
#> 2 Yes 1 0 Yes 1 0
#> 3 No 0 1 No 0 1
Or a more general tidyverse approach:
df %>%
mutate(across(everything(),
function(x) 1 * (x == "Yes"),
.names = "{.col}_Yes")) %>%
mutate(across(tidyselect::vars_select_helpers$where(is.character),
function(x) 1 * (x == "No"),
.names = "{.col}_No")) %>%
select(order(names(.)))
#> Organic Organic_No Organic_Yes StoreBrand StoreBrand_No StoreBrand_Yes
#> 1 Yes 0 1 No 1 0
#> 2 Yes 0 1 Yes 0 1
#> 3 No 1 0 No 1 0
Created on 2021-10-21 by the reprex package (v2.0.0)
Data
df <- data.frame(Organic = c("Yes", "Yes", "No"),
StoreBrand = c("No", "Yes", "No"))
df
#> Organic StoreBrand
#> 1 Yes No
#> 2 Yes Yes
#> 3 No No
Created on 2021-10-21 by the reprex package (v2.0.0)
CodePudding user response:
Here is another approach: Using across
twice with manipulating the .names
argument (learned here: Using a function in .names argument of across function)
library(dplyr)
library(stringr)
df %>%
mutate(across(everything(), ~ifelse(. == "Yes", 1, 0), .names ="{.col}_Yes"),
across(contains("Yes"), ~ifelse(. == 1, 0, 1), .names = "{str_replace(.col,'_Yes', '_No')}"))
output:
Organic StoreBrand Organic_Yes StoreBrand_Yes Organic_No StoreBrand_No
1 Yes No 1 0 0 1
2 Yes Yes 1 1 0 0
3 No No 0 0 1 1