I have the following table in R
id<-c(1,2,3,4)
medal<-c("2021-2020-2018","NA","2019","2015-2014-2012")
df<-data.frame(id,medal)
id medal
1 2021-2020-2018
2 NA
3 2019
4 2015-2014-2012
and would like to break the medal column to multiple dummy variables for each id as follows:
id medal 2021 2020 2019 2018 2015 2014 2012
1 2021-2020-2018 1 1 0 1 0 0 0
2 NA 0 0 0 0 0 0 0
3 2019 0 0 1 0 0 0 0
4 2015-2014-2012 0 0 0 0 1 1 1
I would appreciate your help with this.
CodePudding user response:
qdapTools
has a function that does exactly this, you just need to split the medal
column first:
library(qdapTools)
df <- cbind(df, mtabulate(strsplit(df$medal, "-")))
df[, names(df) != "NA"]
CodePudding user response:
Here is a tidyverse
solution:
Note that I have changed "NA"
in the input data to NA_character_
.
The idea is to use tidyr: separate
and then to pivot the data using gather
and spread
.
Also I convert id
to factor such that id
s without medals stay in the output.
library(tidyverse)
id <- c(1,2,3,4)
medal <- c("2021-2020-2018",NA_character_,"2019","2015-2014-2012")
df <- data.frame(id,medal)
df %>%
separate(medal, sep = "-", into = as.character(1:3), fill = "right") %>%
gather(dummy, year, -id) %>%
select(-dummy) %>%
mutate(val = 1, id = factor(id)) %>%
filter(!is.na(year)) %>%
spread(year, val, fill = 0, drop = FALSE)
id 2012 2014 2015 2018 2019 2020 2021
1 1 0 0 0 1 0 1 1
2 2 0 0 0 0 0 0 0
3 3 0 0 0 0 1 0 0
4 4 1 1 1 0 0 0 0
CodePudding user response:
Or:
library(tidyverse)
id<-c(1,2,3,4)
medal<-c("2021-2020-2018","NA","2019","2015-2014-2012")
df<-data.frame(id,medal)
tmp <- unlist(str_split(df$medal, "-"))
tmp <- sort(tmp[tmp != "NA"])
tmp <- set_names(tmp, tmp)
df %>%
bind_cols(map_dfc(tmp, ~as.integer(str_detect(medal, .x))))
# id medal 2012 2014 2015 2018 2019 2020 2021
#1 1 2021-2020-2018 0 0 0 1 0 1 1
#2 2 NA 0 0 0 0 0 0 0
#3 3 2019 0 0 0 0 1 0 0
#4 4 2015-2014-2012 1 1 1 0 0 0 0
CodePudding user response:
Or using unnest()
library(tidyverse)
id<-c(1,2,3,4)
medal<-c("2021-2020-2018","NA","2019","2015-2014-2012")
df<-data.frame(id,medal)
df %>% mutate(
medal_2=str_split(medal, "-")) %>%
unnest(medal_2) %>%
mutate(value=1) %>%
pivot_wider(
c("id", "medal"), names_from=medal_2, values_from = value
) %>%
replace(is.na(.), 0)
#> # A tibble: 4 x 10
#> id medal `2021` `2020` `2018` `NA` `2019` `2015` `2014` `2012`
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2021-2020-2018 1 1 1 0 0 0 0 0
#> 2 2 NA 0 0 0 1 0 0 0 0
#> 3 3 2019 0 0 0 0 1 0 0 0
#> 4 4 2015-2014-2012 0 0 0 0 0 1 1 1
Created on 2021-10-21 by the reprex package (v2.0.0)