Probably a basic question.
I have a key - value
data.frame
(df
below):
features <- paste0("f",1:5)
set.seed(1)
ids <- paste0("id",1:10)
df <- do.call(rbind,lapply(ids,function(i){
data.frame(id = i, feature = sample(features,3,replace = F))
}))
And I want to tidyr::spread
or reshape2::dcast
it, so that the rows are id' the columns are
feature, but the values are the sum of
featuresfor each
id`.
A simple:
reshape2::dcast(df, id ~ feature)
Doesn't achieve that. It just fills in with feature
s and NA
s
adding fun.aggregate = sum
to the command above results with an error:
> reshape2::dcast(df, id ~ feature, fun.aggregate = sum)
Using feature as value column: use value.var to override.
Error in .fun(.value[0], ...) : invalid 'type' (character) of argument
And tidyr::spread also results with an error:
tidyr::spread(df, key = id, value = feature)
Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 30 rows:
Any idea?
CodePudding user response:
I think you want to count the features and not sum
them. Try with the function length
.
tidyr::pivot_wider(df, names_from = feature,
values_from = feature, values_fn = length, values_fill = 0)
Or with dcast
.
library(data.table)
dcast(setDT(df), id~feature, value.var = 'feature', fun.aggregate = length)
In base R, using table(df)
would give the same output.
table(df)
# feature
#id f1 f2 f3 f4 f5
# id1 1 0 1 1 0
# id10 1 0 1 1 0
# id2 1 1 0 0 1
# id3 0 1 1 1 0
# id4 1 0 1 0 1
# id5 1 1 0 0 1
# id6 1 1 1 0 0
# id7 1 0 0 1 1
# id8 1 1 0 0 1
# id9 0 1 0 1 1