Home > OS >  Turning horizontal data into vertical data
Turning horizontal data into vertical data

Time:10-13

I have data that is structure like such:

actor_data <- structure(list(id = c(123L, 456L, 789L, 912L, 235L), name = c("Tom Cruise", 
"Will Smith", "Ryan Reynolds", "Chris Rock", "Emma Stone"), locationid1 = c(5459L, 
NA, 6114L, NA, NA), location1 = c("Paris, France", "", "Brooklyn, NY", 
"", ""), locationid2 = c(NA, 5778L, NA, NA, 4432L), location3 = c("", 
"Dolby Theater", "", "", "Hollywood"), locationid3 = c(NA, 2526L, 
3101L, NA, NA), location3.1 = c("", "London", "Boston", "", ""
), locationid4 = c(6667L, 2333L, 1118L, NA, NA), location4 = c("Virginia", 
"Maryland", "Washington", "", "")), class = "data.frame", row.names = c(NA, 
-5L))

I am trying to make the location data run vertically instead of horizontally while also making sure its not accounting for blank fields.

So the final result will look like this:

actor_data_exp <- structure(list(id = c(123L, 123L, 456L, 456L, 456L, 789L, 789L, 
789L, 235L), name = c("Tom Cruise", "Tom Cruise", "Will Smith", 
"Will Smith", "Will Smith", "Ryan Reynolds", "Ryan Reynolds", 
"Ryan Reynolds", "Emma Stone"), locationid = c(5459L, 6667L, 
5778L, 2526L, 2333L, 6114L, 3101L, 1118L, 4432L), location = c("Paris, France", 
"Virginia", "Dolby Theater", "London", "Maryland", "Brooklyn, NY", 
"Boston", "Washington", "Hollywood")), class = "data.frame", row.names = c(NA, 
-9L))

Or to give you a visual, will end up looking like this:

enter image description here

CodePudding user response:

I would rename the columns that start with "location", so that there is an underscore before the number in each column name. Then use pivot_longer with the underscore as name_sep, and using c(".value", "var") in the names_to argument that to ensure both location and locationid have their own columns. This will also create the redundant column var which will contain the numbers 1-4 that were appended to the original column names.

Finally, filter out missing values and remove the redundant var column.

library(tidyverse)

actor_data %>%
  rename_with(~ ifelse(grepl("location", .x),
                       sub("^(.*?)([0-9]\\.?\\d?)$", "\\1_\\2", .x), .x)) %>%
  pivot_longer(starts_with("location"), 
             names_sep = "_", names_to = c(".value", "var")) %>%
  filter(!is.na(locationid) & !is.na(location) & nzchar(location)) %>%
  select(-var)
#> # A tibble: 6 x 4
#>      id name          locationid location     
#>   <int> <chr>              <int> <chr>        
#> 1   123 Tom Cruise          5459 Paris, France
#> 2   123 Tom Cruise          6667 Virginia     
#> 3   456 Will Smith          2526 Dolby Theater
#> 4   456 Will Smith          2333 Maryland     
#> 5   789 Ryan Reynolds       6114 Brooklyn, NY 
#> 6   789 Ryan Reynolds       1118 Washington  
  • Related