Home > Blockchain >  R how to use pivot wider taking column names from multiple columns, and adding 1 or zero based on pr
R how to use pivot wider taking column names from multiple columns, and adding 1 or zero based on pr

Time:08-19

I am searching a way for obtaining a large table from a table with a list of species.

Here I give you an example:

I have something like this

data.frame(survey_id = c("ID_1", "ID_2", "ID_3", "ID_4", "ID_5"), 
           list_1 = c("A", "A", "A", "B", "A"), 
           list_2 = c("B", "D", "E", "E", "F"), 
           list_3 = c("C", "", "", "F", ""))

and I want to obtain this

data.frame(survey_id = c("ID_1", "ID_2", "ID_3", "ID_4", "ID_5"), 
           A = c(1,1,1,0,1), 
           B = c(1,0,0,1,0),
           C = c(1,0,0,0,0), 
           D = c(0,1,0,0,0), 
           E = c(0,0,1,1,0), 
           F = c(0,0,0,1,1))

Any suggestion using pivot_wider function?

Thanks for your help.

CodePudding user response:

A slightly round-a-bout way: pivot_longer first, so as you can count by (survey_id, value), and then pivot_wider.

library(dplyr)
library(tidyr)

df1 %>% 
  pivot_longer(-Survey_ID) %>% 
  count(Survey_ID, value) %>% 
  filter(value != "") %>% 
  pivot_wider(names_from = "value", 
              values_from = "n") %>% 
  replace(is.na(.), 0)

Result:

# A tibble: 5 × 7
  Survey_ID     A     B     C     D     E     F
  <chr>     <int> <int> <int> <int> <int> <int>
1 ID_1          1     1     1     0     0     0
2 ID_2          1     0     0     1     0     0
3 ID_3          1     0     0     0     1     0
4 ID_4          0     1     0     0     1     1
5 ID_5          1     0     0     0     0     1

Data df1:

df1 <- data.frame(survey_id = c("ID_1", "ID_2", "ID_3", "ID_4", "ID_5"), 
                  list_1 = c("A", "A", "A", "B", "A"), 
                  list_2 = c("B", "D", "E", "E", "F"), 
                  list_3 = c("C", "", "", "F", ""))

CodePudding user response:

df %>% 
  pivot_longer(-survay_id) %>% # first go longer
  mutate(n=1) %>% # add a value to each record
  select(-name) %>%  # drop the name column
  filter(value != "") %>% # remove empty cells caused by pivot_longer
  # spread(survay_id, n, fill = 0)  # a spread solution
  pivot_wider(names_from=survay_id, values_from=n, values_fill=0) # a pivot_wider solution



  value  ID_1  ID_2  ID_3  ID_4  ID_5
1 A         1     1     1     0     1
2 B         1     0     0     1     0
3 C         1     0     0     0     0
4 D         0     1     0     0     0
5 E         0     0     1     1     0
6 F         0     0     0     1     1
  • Related