I'm very sure there should be a simple alternative but I'm not able to figure it out. Currently using a for loop which is not optimal. My dataframe is like this:
NAME <- c("ABC", "ABC", "ABC", "DEF", "GHI", "GHI", "JKL", "JKL", "JKL", "MNO")
YEAR <- c(2012, 2013, 2014, 2012, 2012, 2013, 2012, 2014, 2016, 2013)
MARKS <- c(45, 75, 95, 91, 75, 76, 85, 88, 89, 77)
MAXIMUM <- c(95, NA, NA, 91, 76, NA, 89, NA, NA, 77)
DF <- data.frame(
NAME,
YEAR,
MARKS,
MAXIMUM
)
> DF
NAME YEAR MARKS MAXIMUM
1 ABC 2012 45 95
2 ABC 2013 75 NA
3 ABC 2014 95 NA
4 DEF 2012 91 91
5 GHI 2012 75 76
6 GHI 2013 76 NA
7 JKL 2012 85 89
8 JKL 2014 88 NA
9 JKL 2016 89 NA
10 MNO 2013 77 77
I want to have only one name per row and each year-wise details (YEAR, MARKS and MAXIMUM columns) should be spread as individual headers. I have tried to use tidyr::pivot_wider
function but was not successful.
I have given the sample output here:
CodePudding user response:
Perhaps you could enumerate by NAME
first based on row_number()
. Then, use pivot_wider
:
library(tidyverse)
DF %>%
group_by(NAME) %>%
mutate(n = row_number()) %>%
pivot_wider(NAME, names_from = n, values_from = c(YEAR, MARKS, MAXIMUM))
Output
NAME YEAR_1 YEAR_2 YEAR_3 MARKS_1 MARKS_2 MARKS_3 MAXIMUM_1 MAXIMUM_2 MAXIMUM_3
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ABC 2012 2013 2014 45 75 95 95 NA NA
2 DEF 2012 NA NA 91 NA NA 91 NA NA
3 GHI 2012 2013 NA 75 76 NA 76 NA NA
4 JKL 2012 2014 2016 85 88 89 89 NA NA
5 MNO 2013 NA NA 77 NA NA 77 NA NA
Or, as mentioned by @RobertoT, you could make YEAR
a factor and then line up your YEAR
values. Using complete
you can fill in NA
for missing YEAR
. The final select
will order your columns.
DF$YEAR_FAC = factor(DF$YEAR)
DF %>%
group_by(NAME) %>%
complete(YEAR_FAC, fill = list(YEAR = NA)) %>%
mutate(n = row_number()) %>%
pivot_wider(NAME, names_from = n, values_from = c(YEAR, MARKS, MAXIMUM)) %>%
select(NAME, ends_with(as.character(1:nlevels(DF$YEAR_FAC))))
Output
NAME YEAR_1 MARKS_1 MAXIMUM_1 YEAR_2 MARKS_2 MAXIMUM_2 YEAR_3 MARKS_3 MAXIMUM_3 YEAR_4 MARKS_4 MAXIMUM_4
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ABC 2012 45 95 2013 75 NA 2014 95 NA NA NA NA
2 DEF 2012 91 91 NA NA NA NA NA NA NA NA NA
3 GHI 2012 75 76 2013 76 NA NA NA NA NA NA NA
4 JKL 2012 85 89 NA NA NA 2014 88 NA 2016 89 NA
5 MNO NA NA NA 2013 77 77 NA NA NA NA NA NA
CodePudding user response:
In addition to @Ben 1 solution we could a code that I recently learned to order the columns Combining two dataframes with alternating column position
DF %>%
group_by(NAME) %>%
mutate(n = row_number()) %>%
pivot_wider(NAME, names_from = n, values_from = c(YEAR, MARKS, MAXIMUM)) %>%
select(-NAME) %>%
dplyr::select(all_of(c(matrix(names(.), ncol = 3, byrow = TRUE))))
NAME YEAR_3 MARKS_3 MAXIMUM_3 YEAR_1 MARKS_1 MAXIMUM_1 YEAR_2 MARKS_2 MAXIMUM_2
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ABC 2014 95 NA 2012 45 95 2013 75 NA
2 DEF NA NA NA 2012 91 91 NA NA NA
3 GHI NA NA NA 2012 75 76 2013 76 NA
4 JKL 2016 89 NA 2012 85 89 2014 88 NA
5 MNO NA NA NA 2013 77 77 NA NA NA
CodePudding user response:
I think all the previous answers have overlooked that the expected output is based on YEAR
as a factor. The expected output has 4 grouped-columns per row, not 3. Therefore, you avoid mixing different years in the same column.
You can assign a number for every row- grp
- based on the level of Year
as a factor()
. Also, if you first pivot longer, you can arrange the values as you want and then pivot wider everything so the columns are sorted as you expect:
library(tidyverse)
DF %>%
mutate(grp = as.integer(factor(DF$YEAR,unique(DF$YEAR)))) %>%
pivot_longer(cols=c('YEAR','MARKS','MAXIMUM'), names_to = 'COLNAMES', values_to= 'COL_VALUES') %>%
arrange(NAME,grp) %>%
pivot_wider(names_from = c(COLNAMES,grp), values_from= COL_VALUES, names_sep = '')
Output:
# A tibble: 5 x 13
NAME YEAR1 MARKS1 MAXIMUM1 YEAR2 MARKS2 MAXIMUM2 YEAR3 MARKS3 MAXIMUM3 YEAR4 MARKS4 MAXIMUM4
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ABC 2012 45 95 2013 75 NA 2014 95 NA NA NA NA
2 DEF 2012 91 91 NA NA NA NA NA NA NA NA NA
3 GHI 2012 75 76 2013 76 NA NA NA NA NA NA NA
4 JKL 2012 85 89 NA NA NA 2014 88 NA 2016 89 NA
5 MNO NA NA NA 2013 77 77 NA NA NA NA NA NA
However, I suggest you to keep track of the years to not make the tibble more confusing:
DF$YEAR = factor(DF$YEAR)
DF %>%
pivot_longer(cols=c('MARKS','MAXIMUM'), names_to = 'COLNAMES', values_to= 'COL_VALUES') %>%
arrange(NAME,YEAR) %>%
pivot_wider(names_from = c(COLNAMES,YEAR), values_from= COL_VALUES)
# A tibble: 5 x 9
NAME MARKS_2012 MAXIMUM_2012 MARKS_2013 MAXIMUM_2013 MARKS_2014 MAXIMUM_2014 MARKS_2016 MAXIMUM_2016
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ABC 45 95 75 NA 95 NA NA NA
2 DEF 91 91 NA NA NA NA NA NA
3 GHI 75 76 76 NA NA NA NA NA
4 JKL 85 89 NA NA 88 NA 89 NA
5 MNO NA NA 77 77 NA NA NA NA
CodePudding user response:
Here a version with data.table
:
library(data.table)
DT <- setDT(DF)
# numerotate the line
DT[,I := .I - .I[1] 1,by = NAME]
# melt to have only three columns
tmp <- melt(DT,measure.vars = c("YEAR","MARKS","MAXIMUM"))
# transforming to wide
dcast(tmp,
NAME ~ paste0(variable,I),
value.var = "value")
NAME MARKS1 MARKS2 MARKS3 MAXIMUM1 MAXIMUM2 MAXIMUM3 YEAR1 YEAR2 YEAR3
1: ABC 45 75 95 95 NA NA 2012 2013 2014
2: DEF 91 NA NA 91 NA NA 2012 NA NA
3: GHI 75 76 NA 76 NA NA 2012 2013 NA
4: JKL 85 88 89 89 NA NA 2012 2014 2016
5: MNO 77 NA NA 77 NA NA 2013 NA NA