Suppose I need to reorder a data frame:
input <- structure(list(Name = c("Luis_aaa_A", "Luis_bb_BB", "Luis_c_CCC",
"Jackie_aaa_A", "Jackie_bb_BB", "Jackie_c_CCC"), Age = c(0L,
8L, 9L, 12L, 13L, 14L)), class = "data.frame", row.names = c(NA,
-6L))
That looks like this:
input
Name Age
1 Luis_aaa_A 0
2 Luis_bb_BB 8
3 Luis_c_CCC 9
4 Jackie_aaa_A 12
5 Jackie_bb_BB 13
6 Jackie_c_CCC 14
I want to:
- transform the data frame from long to wide,
- creating a column named
A
,BB
, andCCC
, where I put the content ofinput$Age
accordingly, - and ideally, rename the content of
input$Name
removing the unnecessary characters (that are those after the names).
The output should therefore look like this:
output <- structure(list(Name = c("Luis", "Jackie"), A = c(0L, 12L), BB = c(8L,
13L), CCC = c(9L, 14L)), class = "data.frame", row.names = c(NA,
-2L))
output
Name A BB CCC
1 Luis 0 8 9
2 Jackie 12 13 14
To use pivot_wider()
, I first create an empty column
input$A<-0
input$BB<-0
input$CCC<-0
input %>%
pivot_wider(names_from = c("A","BB","CC"), values_from = Age)
The output is not the desired one, clearly. Which is the correct way to put columns' names into names_from()
? As you can see from input
, input$Name
is difficult to subset, as each row has a different length (otherwise I would have used something like str_sub()
). And obviously I have no idea how to change the content of input$Name
(i.e., from Luis_aaa_A
to Luis
) after widening the dataframe.
This is a subset of the original data (that has 300 rows).
I am not very familiar with the tidyverse
, unfortunately.
CodePudding user response:
You could separate Name
with '_'
and omit the middle part by using NA
in the argument into
. Then transform the data to wide with pivot_wider()
.
library(tidyr)
input %>%
separate(Name, into = c('Name', NA, 'name')) %>%
pivot_wider(names_from = name, values_from = Age)
# # A tibble: 2 × 4
# Name A BB CCC
# <chr> <int> <int> <int>
# 1 Luis 0 8 9
# 2 Jackie 12 13 14