Home > database >  Fixing column names with unnest_wider in R
Fixing column names with unnest_wider in R

Time:07-19

I am having a problem in R and seek your help!

I have a tibble that looks like this (unfortunately, I can't figure out how to write the code to create the table here). My table looks exactly like this to the viewer, e.g., you can see the letter "c" in the table.

person     zip_code
Laura      c("11001", "28720", "32948", "10309")
Mel        c("80239", "23909")
Jake       c("20930", "23929", "13909")

In short, my "zip_code" column contains rows of character vectors, each of which contain multiple ZIP codes.

I would like to separate the column "zip_code" into multiple columns, each containing one zip code (e.g., "zip_code_1", "zip_code_2", etc.). To do so, I have been using unnest_wider:

unnest_wider(zip_code, names_sep="_") 

However, whenever I do this, the names of the new columns generated by unnest_wider come out wrong. Instead of being "zip_code_1", "zip_code_2", "zip_code_3," the new names are "zip_code_1[,1]", "zip_code_2[,1]", and zip_code_3[,1]". Basically, each column name has a "[,1]" afterward.

I have not repeated these column names anywhere, so I have no idea why they look like this.

I cannot manually rename them with:

dplyr::rename(zip_code_1=`zip_code_[,1]`)

If I do this, I get an error message.

Any help fixing these names is greatly appreciated! Thank you!

CodePudding user response:

With the OP's data, it is a case of matrix column, thus if we convert to a vector (doesn't have dim attributes), the names_sep should work

library(dplyr)
library(purrr)
library(tidyr)
df1 %>%
   mutate(zip = map(zip, c)) %>% 
   unnest_wider(zip, names_sep = "_")
# A tibble: 3 × 4
  zip_1 zip_2 zip_3 zip_4
  <chr> <chr> <chr> <chr>
1 10010 10019 10010 10019
2 10019 10032 10019 10032
3 11787 11375 11787 11375

Or as @IceCreamToucan mentioned, the transform option in unnest_wider would make it concise

unnest_wider(df1, zip, names_sep = '_', transform = c)
# A tibble: 3 × 4
  zip_1 zip_2 zip_3 zip_4
  <chr> <chr> <chr> <chr>
1 10010 10019 10010 10019
2 10019 10032 10019 10032
3 11787 11375 11787 11375

data

df1 <- structure(list(zip = list(structure(c("10010", "10019", "10010", 
"10019"), dim = c(4L, 1L)), structure(c("10019", "10032", "10019", 
"10032"), dim = c(4L, 1L)), structure(c("11787", "11375", "11787", 
"11375"), dim = c(4L, 1L)))), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response:

library(dplyr)
library(tidyr) # unnest, pivot_wider
dat %>%
  mutate(
    # because your sample data didn't have an ID-column
    person = LETTERS[row_number()],
    # it's better to work with list-columns of strings, not matrices
    zip_code = lapply(zip_code, c)
  ) %>%
  unnest(zip_code) %>%
  group_by(person) %>%
  mutate(rn = paste0("zip", row_number())) %>%
  pivot_wider(person, names_from = "rn", values_from = "zip_code") %>%
  ungroup()
# # A tibble: 6 x 5
#   person zip1  zip2  zip3  zip4 
#   <chr>  <chr> <chr> <chr> <chr>
# 1 A      11374 11374 NA    NA   
# 2 B      10023 10023 NA    NA   
# 3 C      10028 10028 NA    NA   
# 4 D      11210 12498 11210 12498
# 5 E      10301 10301 NA    NA   
# 6 F      12524 10605 12524 10605

Data

dat <- structure(list(zip_code = list(structure(c("11374", "11374"), .Dim = 2:1), structure(c("10023", "10023"), .Dim = 2:1), structure(c("10028", "10028"), .Dim = 2:1), structure(c("11210", "12498", "11210", "12498"), .Dim = c(4L, 1L)), structure(c("10301", "10301"), .Dim = 2:1), structure(c("12524", "10605", "12524", "10605"), .Dim = c(4L, 1L)))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))
  • Related