I have a dataset1 which is as follows (table 1) :
structure(list(TCODE = c("T1", "T1", "T1", "T2", "T2", "T2",
"T2", "T2", "T3", "T4", "T4", "T4", "T4"), MATERIAL = c("M105",
"M100", "M102", "M345", "M400", "M100", "M220", "M260", "M400",
"M390", "M400", "M100", "M700")), class = "data.frame", row.names = c(NA,
-13L))
TCODE MATERIAL
1 T1 M105
2 T1 M100
3 T1 M102
4 T2 M345
5 T2 M400
6 T2 M100
7 T2 M220
8 T2 M260
9 T3 M400
10 T4 M390
11 T4 M400
12 T4 M100
13 T4 M700
How to convert multiple rows into a single row based on first column? The desired output is : table 2
1 2 3 4 5
1 M105, M100, M102
2 M345, M400, M100, M220, M260
3 M400
4 M390, M400, M100, M700
Thanks.
CodePudding user response:
Using toString
in aggregate
.
aggregate(. ~ TCODE, dat, toString)
# TCODE MATERIAL
# 1 T1 M105, M100, M102
# 2 T2 M345, M400, M100, M220, M260
# 3 T3 M400
# 4 T4 M390, M400, M100, M700
Data:
dat <- structure(list(TCODE = c("T1", "T1", "T1", "T2", "T2", "T2",
"T2", "T2", "T3", "T4", "T4", "T4", "T4"), MATERIAL = c("M105",
"M100", "M102", "M345", "M400", "M100", "M220", "M260", "M400",
"M390", "M400", "M100", "M700")), class = "data.frame", row.names = c(NA,
-13L))
CodePudding user response:
You can do:
df <- structure(list(TCODE = c("T1", "T1", "T1", "T2", "T2", "T2",
"T2", "T2", "T3", "T4", "T4", "T4", "T4"), MATERIAL = c("M105",
"M100", "M102", "M345", "M400", "M100", "M220", "M260", "M400",
"M390", "M400", "M100", "M700")), class = "data.frame", row.names = c(NA,
-13L))
library(tidyverse)
df %>%
group_by(TCODE) %>%
summarize(MATERIAL= paste0(MATERIAL, collapse = ', '))
# A tibble: 4 x 2
TCODE MATERIAL
<chr> <chr>
1 T1 M105, M100, M102
2 T2 M345, M400, M100, M220, M260
3 T3 M400
4 T4 M390, M400, M100, M700
If you want to have 5 columns, pivot_wider:
df %>%
group_by(TCODE) %>%
mutate(id = 1:n()) %>%
ungroup() %>%
pivot_wider(values_from = MATERIAL,
names_from = id,
names_prefix = 'id_')
# A tibble: 4 x 6
TCODE id_1 id_2 id_3 id_4 id_5
<chr> <chr> <chr> <chr> <chr> <chr>
1 T1 M105 M100 M102 <NA> <NA>
2 T2 M345 M400 M100 M220 M260
3 T3 M400 <NA> <NA> <NA> <NA>
4 T4 M390 M400 M100 M700 <NA>
CodePudding user response:
We could do it this way:
library(dplyr)
library(tidyr)
df %>%
group_by(TCODE) %>%
mutate(MATERIAL = toString(MATERIAL)) %>%
slice(1) %>%
mutate(MATERIAL = str_split(MATERIAL, ', ')) %>%
unnest_wider(MATERIAL)
TCODE ...1 ...2 ...3 ...4 ...5
<chr> <chr> <chr> <chr> <chr> <chr>
1 T1 M105 M100 M102 NA NA
2 T2 M345 M400 M100 M220 M260
3 T3 M400 NA NA NA NA
4 T4 M390 M400 M100 M700 NA
CodePudding user response:
Thanks for your help but I think might not described my requirement cleary. Let me explain a little more. I've put column and row names for explanation only. There will be no column or row names in data file.
Input is :
1 T1 M105
2 T1 M100
3 T1 M102
4 T2 M345
5 T2 M400
6 T2 M100
7 T2 M220
8 T2 M260
9 T3 M400
10 T4 M390
11 T4 M400
12 T4 M100
13 T4 M700
What actually required output is like :
1 M105 M100 M102
2 M345 M400 M100 M220 M260
3 M400
4 M390 M400 M100 M700
Thanks for your help.
CodePudding user response:
Using data.table
library(data.table)
dcast(setDT(df), TCODE ~ paste0("id_", rowid(TCODE)), value.var = 'MATERIAL')
-output
TCODE id_1 id_2 id_3 id_4 id_5
<char> <char> <char> <char> <char> <char>
1: T1 M105 M100 M102 <NA> <NA>
2: T2 M345 M400 M100 M220 M260
3: T3 M400 <NA> <NA> <NA> <NA>
4: T4 M390 M400 M100 M700 <NA>