Home > database >  adding 1/0 columns from list all at once
adding 1/0 columns from list all at once

Time:01-13

I have a dataframe with identifiers and storm categories. Right now, the categories are in one column, but I want to add columns for each category with a 1 or 0 value. I don't think I want to reshape the data as wide, because in the actual dataset there are a number of long format variables I want to keep. I am using a series of ifelse statements currently, but it feels like there is probably a much better way:

library(dplyr)
library(tidyr)

df <- data.frame(
ID = c("A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D"),
cat = c("TS", NA, NA, "TS", "1", "1", NA, NA, "2", NA, NA, NA)
)

df$cat_TS <- ifelse(df$cat == "TS", 1, 0) %>% replace_na(., 0)
df$cat_1 <- ifelse(df$cat == "1", 1, 0) %>% replace_na(., 0)
df$cat_2 <- ifelse(df$cat == "2", 1, 0) %>% replace_na(., 0)

CodePudding user response:

We may use pivot_wider - create a sequence column 'rn', and then use pivot_wider to reshape to wide with values_fn as length and values_fill as 0

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
   mutate(rn = row_number(), cat1 = cat) %>% 
   pivot_wider(names_from = cat1, values_from = cat1, 
     values_fn = length, values_fill = 0, names_prefix = "cat_")%>%    
   select(-cat_NA, -rn)

-output

# A tibble: 12 × 5
   ID    cat   cat_TS cat_1 cat_2
   <chr> <chr>  <int> <int> <int>
 1 A     TS         1     0     0
 2 B     <NA>       0     0     0
 3 C     <NA>       0     0     0
 4 D     TS         1     0     0
 5 A     1          0     1     0
 6 B     1          0     1     0
 7 C     <NA>       0     0     0
 8 D     <NA>       0     0     0
 9 A     2          0     0     1
10 B     <NA>       0     0     0
11 C     <NA>       0     0     0
12 D     <NA>       0     0     0

or use fastDummies

library(fastDummies)
df %>% 
   dummy_cols("cat", remove_selected_columns = FALSE, ignore_na = TRUE) %>% 
   mutate(across(starts_with('cat_'), ~ replace_na(.x, 0)))

-output

   ID  cat cat_1 cat_2 cat_TS
1   A   TS     0     0      1
2   B <NA>     0     0      0
3   C <NA>     0     0      0
4   D   TS     0     0      1
5   A    1     1     0      0
6   B    1     1     0      0
7   C <NA>     0     0      0
8   D <NA>     0     0      0
9   A    2     0     1      0
10  B <NA>     0     0      0
11  C <NA>     0     0      0
12  D <NA>     0     0      0

CodePudding user response:

An idea using base R

First, get all unique category names

cats <- unique(df$cat[!is.na(df$cat)])
cats
[1] "TS" "1"  "2"

Then look for matches in column cat for each entry in cats. PS, I left the cat column in to show the matching is right. Remove it by using df$ID instead of df as the first argument in cbind.

cbind(df, setNames(data.frame(sapply(seq_along(cats), function(x) 
  df$cat %in% cats[x]) * 1), cats))
   ID  cat TS 1 2
1   A   TS  1 0 0
2   B <NA>  0 0 0
3   C <NA>  0 0 0
4   D   TS  1 0 0
5   A    1  0 1 0
6   B    1  0 1 0
7   C <NA>  0 0 0
8   D <NA>  0 0 0
9   A    2  0 0 1
10  B <NA>  0 0 0
11  C <NA>  0 0 0
12  D <NA>  0 0 0
  •  Tags:  
  • r
  • Related