Home > OS >  Re-structure a dataframe
Re-structure a dataframe

Time:08-08

Suppose I need to reorder a data frame:

input <- structure(list(Name = c("Luis_aaa_A", "Luis_bb_BB", "Luis_c_CCC", 
"Jackie_aaa_A", "Jackie_bb_BB", "Jackie_c_CCC"), Age = c(0L, 
8L, 9L, 12L, 13L, 14L)), class = "data.frame", row.names = c(NA, 
-6L))

That looks like this:

input
          Name Age
1   Luis_aaa_A   0
2   Luis_bb_BB   8
3   Luis_c_CCC   9
4 Jackie_aaa_A  12
5 Jackie_bb_BB  13
6 Jackie_c_CCC  14

I want to:

  • transform the data frame from long to wide,
  • creating a column named A, BB, and CCC, where I put the content of input$Age accordingly,
  • and ideally, rename the content of input$Name removing the unnecessary characters (that are those after the names).

The output should therefore look like this:

output <- structure(list(Name = c("Luis", "Jackie"), A = c(0L, 12L), BB = c(8L, 
13L), CCC = c(9L, 14L)), class = "data.frame", row.names = c(NA, 
-2L))

output
    Name  A BB CCC
1   Luis  0  8   9
2 Jackie 12 13  14

To use pivot_wider(), I first create an empty column

input$A<-0
input$BB<-0
input$CCC<-0

input %>% 
  pivot_wider(names_from = c("A","BB","CC"), values_from = Age)

The output is not the desired one, clearly. Which is the correct way to put columns' names into names_from()? As you can see from input, input$Name is difficult to subset, as each row has a different length (otherwise I would have used something like str_sub()). And obviously I have no idea how to change the content of input$Name (i.e., from Luis_aaa_A to Luis) after widening the dataframe. This is a subset of the original data (that has 300 rows). I am not very familiar with the tidyverse, unfortunately.

CodePudding user response:

You could separate Name with '_' and omit the middle part by using NA in the argument into. Then transform the data to wide with pivot_wider().

library(tidyr)

input %>%
  separate(Name, into = c('Name', NA, 'name')) %>%
  pivot_wider(names_from = name, values_from = Age)

# # A tibble: 2 × 4
#   Name       A    BB   CCC
#   <chr>  <int> <int> <int>
# 1 Luis       0     8     9
# 2 Jackie    12    13    14
  • Related