Hope you can help me.
I have the following df:
structure(list(Donorcode = c("406A001", "406A002", "406A003",
"406A004", "406A003", "406A008", "406A009", "406A007"), Doos = c(1,
1, 1, 1, 2, 2, 2, 2), `Leeftijd T0` = c(70, 73, 79, 75, 70, 73,
79, 75), Instituut = c("Spaarne ziekenhuis", "Spaarne ziekenhuis",
"Spaarne ziekenhuis", "RIVM", "RIVM", "RIVM", "RIVM", "Spaarne ziekenhuis"
), Datum = structure(c(1567468800, 1567555200, 1567900800, 1567468800,
1567468800, 1567555200, 1567987200, 1568246400), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -8L))
I wish to make 4 groups of this data where each group has each of the values from the column 'Doos'.
My output would look like this:
Donorcode Doos `Leeftijd T0` Instituut Datum
<chr> <dbl> <dbl> <chr> <dttm>
1 406A001 1 70 Spaarne ziekenhuis 2019-09-03 00:00:00
2 406A003 2 70 RIVM 2019-09-03 00:00:00
3 406A003 1 79 Spaarne ziekenhuis 2019-09-08 00:00:00
4 406A009 2 79 RIVM 2019-09-09 00:00:00
5 406A004 1 75 RIVM 2019-09-03 00:00:00
6 406A008 2 73 RIVM 2019-09-04 00:00:00
7 406A002 1 73 Spaarne ziekenhuis 2019-09-04 00:00:00
8 406A007 2 75 Spaarne ziekenhuis 2019-09-12 00:00:00
I've seen many posts about grouping and then summarizing but I don't need to summarize and the group_by function by dpylr doesn't seem to work for me. This is the output I get:
dplyr::group_by(df, Doos, Instituut)
# A tibble: 8 × 5
# Groups: Doos, Instituut [4]
Donorcode Doos `Leeftijd T0` Instituut Datum
<chr> <dbl> <dbl> <chr> <dttm>
1 406A001 1 70 Spaarne ziekenhuis 2019-09-03 00:00:00
2 406A002 1 73 Spaarne ziekenhuis 2019-09-04 00:00:00
3 406A003 1 79 Spaarne ziekenhuis 2019-09-08 00:00:00
4 406A004 1 75 RIVM 2019-09-03 00:00:00
5 406A003 2 70 RIVM 2019-09-03 00:00:00
6 406A008 2 73 RIVM 2019-09-04 00:00:00
7 406A009 2 79 RIVM 2019-09-09 00:00:00
8 406A007 2 75 Spaarne ziekenhuis 2019-09-12 00:00:00
Could someone please help? If it's possible, I would like a function that could group by multiple columns at a time (so that I can also include the Instituut column for the grouping).
I hope anyone can help me!
Thanks so much
CodePudding user response:
Maybe you want something like this where you order your dataframe based on certain sequence order. Your dataframe has a "given_seq" of 1 1 1 1 2 2 2 2
and you want a "seq_order" of 1 2 1 2 1 2 1 2
. You can use the following code to order your dataframe based on that sequential order:
given_seq <- as.vector(df$Doos)
seq_order <- rep(1:2, 4)
df[order(given_seq),][order(order(seq_order)),]
#> Donorcode Doos Leeftijd T0 Instituut Datum
#> 1 406A001 1 70 Spaarne ziekenhuis 2019-09-03
#> 5 406A003 2 70 RIVM 2019-09-03
#> 2 406A002 1 73 Spaarne ziekenhuis 2019-09-04
#> 6 406A008 2 73 RIVM 2019-09-04
#> 3 406A003 1 79 Spaarne ziekenhuis 2019-09-08
#> 7 406A009 2 79 RIVM 2019-09-09
#> 4 406A004 1 75 RIVM 2019-09-03
#> 8 406A007 2 75 Spaarne ziekenhuis 2019-09-12
Created on 2022-08-24 with reprex v2.0.2
CodePudding user response:
You are looking for kind of "anti-clustering', and there exists the package anticlust
for that. Check if this works for you.
To anti-cluster for 'Doos'
and 'Instituut'
we first need both as "numeric"
s, which we can get using transform
and do as.factor/as.numeric
and then subset
for columns.
library(anticlust)
dat$group <- anticlustering(
subset(transform(dat, Instituut2=as.numeric(as.factor(Instituut))),
select=c(Doos, Instituut2)),
K=4,
objective="variance",
method="local-maximum"
)
We can assess the result better when it's order
ed.
dat[order(dat$group), ]
# Donorcode Doos Leeftijd T0 Instituut Datum group
# 3 406A003 1 79 Spaarne ziekenhuis 2019-09-08 1
# 5 406A003 2 70 RIVM 2019-09-03 1
# 2 406A002 1 73 Spaarne ziekenhuis 2019-09-04 2
# 7 406A009 2 79 RIVM 2019-09-09 2
# 1 406A001 1 70 Spaarne ziekenhuis 2019-09-03 3
# 6 406A008 2 73 RIVM 2019-09-04 3
# 4 406A004 1 75 RIVM 2019-09-03 4
# 8 406A007 2 75 Spaarne ziekenhuis 2019-09-12 4
or make a table
.
with(dat, table(Doos, Instituut, group))
# , , group = 1
#
# Instituut
# Doos RIVM Spaarne ziekenhuis
# 1 0 1
# 2 1 0
#
# , , group = 2
#
# Instituut
# Doos RIVM Spaarne ziekenhuis
# 1 0 1
# 2 1 0
#
# , , group = 3
#
# Instituut
# Doos RIVM Spaarne ziekenhuis
# 1 0 1
# 2 1 0
#
# , , group = 4
#
# Instituut
# Doos RIVM Spaarne ziekenhuis
# 1 1 0
# 2 0 1