lets say I have the following data frame:
hugo <- c("bnv", "cdv", "gcd", "efd", "efd")
sample <- c("1", "2", "3", "2", "4")
data.frame(hugo, sample)
hugo sample
1 bnv 1
2 cdv 2
3 gcd 3
4 efd 2
5 efd 4
I want to get rid of duplicate sample numbers and make like this:
hugo2 sample2
1 bnv 1
2 cdv, efd 2
3 gcd 3
4 efd 4
is there a way to do this?
CodePudding user response:
Either using toString
in aggregate
,
(a1 <- aggregate(hugo ~ sample, df, toString))
# sample hugo
# 1 1 bnv
# 2 2 cdv, efd
# 3 3 gcd
# 4 4 efd
where:
str(a1)
# 'data.frame': 4 obs. of 2 variables:
# $ sample: chr "1" "2" "3" "4"
# $ hugo : chr "bnv" "cdv, efd" "gcd" "efd"
Or using list
,
(a2 <- aggregate(hugo ~ sample, df, list))
# sample hugo
# 1 1 bnv
# 2 2 cdv, efd
# 3 3 gcd
# 4 4 efd
which looks similar, but:
str(a2)
# 'data.frame': 4 obs. of 2 variables:
# $ sample: chr "1" "2" "3" "4"
# $ hugo :List of 4
# ..$ : chr "bnv"
# ..$ : chr "cdv" "efd"
# ..$ : chr "gcd"
# ..$ : chr "efd"
Depends on what you need.
CodePudding user response:
You could use dplyr
and summarize
together with paste0
to achieve this:
hugo <- c("bnv", "cdv", "gcd", "efd", "efd")
sample <- c("1", "2", "3", "2", "4")
df1 <- data.frame(hugo, sample)
library(dplyr)
df1 %>%
group_by(sample) %>%
summarize(hugo = paste0(hugo, collapse = ", ")) %>%
ungroup
#> # A tibble: 4 × 2
#> sample hugo
#> <fct> <chr>
#> 1 1 bnv
#> 2 2 cdv, efd
#> 3 3 gcd
#> 4 4 efd
CodePudding user response:
You can use toString()
by group:
group_by(df,sample2=sample) %>% summarize(hugo2=toString(hugo))
Output:
# A tibble: 4 × 2
sample2 hugo2
<chr> <chr>
1 1 bnv
2 2 cdv, efd
3 3 gcd
4 4 efd