Home > other >  To pivot a multiple column data frame in R, into one column, and splitting the original column name
To pivot a multiple column data frame in R, into one column, and splitting the original column name

Time:12-25

My data is a more complex but simplified it basically looks like this. Each row has only one Yes in one of the multiple Color:* columns. The "Color" before the colon needs to be the new column name, with the named color "Blue", "Red", or "Green" after the colon pivoted under "Color" to the correct "IDENTITY" I forgot to mention there are variable columns before and after these multiple columns I want to pivot into one column. the columns I want to pivot all have the same name before the colon.

IDENTITY Color:Blue Color:Red Color:Green
1 Yes
2 Yes
3 Yes

and what I would like is to pivot to this

IDENTITY Color
1 Green
2 Blue
3 Red

I am not sure if this is a pivot problem. I have read through the tidyr pivot documentation at

https://tidyr.tidyverse.org/articles/pivot.html

I do not see a similar example or able to identify one of the solutions that might work with my supplied data.

Can anyone help me with a code chunk I can follow to solve what seems on the face of it a simple problem, but eludes my limited proficiency with R. Thank you and Seasons Blessings

CodePudding user response:

We may convert the blanks ("") to NA, in character columns by looping across the character columns, then reshape to 'long' by selecting the cols that have column names starts_with "Color:"

library(tidyr)
library(dplyr)
df1 %>%
   mutate(across(where(is.character), ~ na_if(.x, ""))) %>% 
   pivot_longer(cols = starts_with("Color:"), names_to = c(".value", "Color2"), 
     names_sep = ":", values_drop=TRUE) %>% 
  select(IDENTITY, Color = Color2)

-output

# A tibble: 3 × 2
  IDENTITY Color
     <int> <chr>
1        1 Green
2        2 Blue 
3        3 Red  

CodePudding user response:

You would need to use the pivot_longer function. This will gather columns and turn them into rows.

Note: I had to put `` in front of the column names since : is one of those pesky reserved characters in R. You want to avoid having that as a column separator ( _ could be a useful substitution). You can't always control this, but it's just something to look out for.

library(tidyverse)

x <- tribble( ~IDENTITY, ~`Color:Blue`, ~`Color:Red`, ~`Color:Green`,
              1L, "", "", "Yes",
              2L, "Yes", "", "",
              3L, "" , "Yes", "")

x %>%
  rename_with(stringr::str_replace,
              pattern = "Color:",
              replacement = "") %>%
  pivot_longer(!IDENTITY,
               names_to = "Color") %>%
  filter(value != "") %>%
  select(!value)
  • Related