Home > database >  How can I combine multiple columns into one in an R dataset?
How can I combine multiple columns into one in an R dataset?

Time:11-16

I am trying to combine multiple columns in R into a single one.

My data looks like this:
      age gender a     b     c     d     e     f     race  insured 
 1     13 Female 0     0     0     0     0     0     white 0      
 2     12 Female 0     0     0     0     0     0     white 1      
 3     19 Male   0     0     0     0     0     1     other 0      
 4     19 Female 0     1     0     0     0     0     white 0      
 5     13 Female 1     1     0     0     0     1     white 0   

This is what my desired output looks like:

      age gender   race  insured value
 1     13 Female   white 0        none    
 2     12 Female   white 1        none
 3     19 Male     other 0        f
 4     19 Female   white 0        b
 5     13 Female   white 0        a
 5     13 Female   white 0        b
 5     13 Female   white 0        f

I've tried using gather to create a long data frame but this doesn't add the data to the dataframe like i want. How can I do this with the dplyr or tidyverse package?

CodePudding user response:

A solution using tidyverse. dat4 is the final output.

library(tidyverse)

dat2 <- dat %>%
  mutate(ID = 1:n())

dat3 <- dat2 %>%
  pivot_longer(a:f, names_to = "value", values_to = "number") %>%
  filter(number == 1) %>%
  select(-number)

dat4 <- dat2 %>%
  left_join(dat3) %>%
  select(-ID, -c(a:f)) %>%
  replace_na(list(value = "none"))

dat4
#   age gender  race insured value
# 1  13 Female white       0  none
# 2  12 Female white       1  none
# 3  19   Male other       0     f
# 4  19 Female white       0     b
# 5  13 Female white       0     a
# 6  13 Female white       0     b
# 7  13 Female white       0     f

DATA

dat <- read.table(text = "      age gender a     b     c     d     e     f     race  insured 
 1     13 Female 0     0     0     0     0     0     white 0      
 2     12 Female 0     0     0     0     0     0     white 1      
 3     19 Male   0     0     0     0     0     1     other 0      
 4     19 Female 0     1     0     0     0     0     white 0      
 5     13 Female 1     1     0     0     0     1     white 0",
                  header = TRUE)

CodePudding user response:

One more way

df <- read.table(text = ' row_num     age gender a     b     c     d     e     f     race  insured 
 1     13 Female 0     0     0     0     0     0     white 0      
 2     12 Female 0     0     0     0     0     0     white 1      
 3     19 Male   0     0     0     0     0     1     other 0      
 4     19 Female 0     1     0     0     0     0     white 0      
 5     13 Female 1     1     0     0     0     1     white 0', header = T)

df
#>   row_num age gender a b c d e f  race insured
#> 1       1  13 Female 0 0 0 0 0 0 white       0
#> 2       2  12 Female 0 0 0 0 0 0 white       1
#> 3       3  19   Male 0 0 0 0 0 1 other       0
#> 4       4  19 Female 0 1 0 0 0 0 white       0
#> 5       5  13 Female 1 1 0 0 0 1 white       0
library(tidyverse)

nm <- c('a', 'b', 'c', 'd', 'e', 'f')
df %>% mutate(across(a:f, as.logical)) %>%
  nest(value = c(a, b, c, d, e, f)) %>% 
  mutate(value = map(value, ~ ifelse(length(nm[unlist(.)]) == 0, 
                                      'none', 
                                      paste(unlist(nm[unlist(.)]), collapse = ',')
                                      )
                     )) %>%
  unnest(value) %>% 
  separate_rows(value, sep = ',')
#> # A tibble: 7 x 6
#>   row_num   age gender race  insured value
#>     <int> <int> <chr>  <chr>   <int> <chr>
#> 1       1    13 Female white       0 none 
#> 2       2    12 Female white       1 none 
#> 3       3    19 Male   other       0 f    
#> 4       4    19 Female white       0 b    
#> 5       5    13 Female white       0 a    
#> 6       5    13 Female white       0 b    
#> 7       5    13 Female white       0 f

Created on 2021-11-16 by the reprex package (v2.0.0)

CodePudding user response:

(not allowed to comment, also being a fairly new user)

The information you give it quite minimum. If you just want to combine columns check out the merge command, using this command you need to bind the two (your df1 and the new Column1) by column name, first add an empty column with the right name to your dataframe and then merge them:

names(df1)[11]<- "Value"
New_df <- merge(df1,Column1, by = "Value")
  • Related