Home > Back-end >  Spread or dcast and fill in counts
Spread or dcast and fill in counts

Time:10-06

Probably a basic question.

I have a key - value data.frame(df below):

features <- paste0("f",1:5)
set.seed(1)
ids <- paste0("id",1:10)

df <- do.call(rbind,lapply(ids,function(i){
  data.frame(id = i, feature = sample(features,3,replace = F))
}))

And I want to tidyr::spread or reshape2::dcast it, so that the rows are id' the columns are feature, but the values are the sum of featuresfor eachid`.

A simple:

reshape2::dcast(df, id ~ feature)

Doesn't achieve that. It just fills in with features and NAs

adding fun.aggregate = sum to the command above results with an error:

> reshape2::dcast(df, id ~ feature, fun.aggregate = sum)
Using feature as value column: use value.var to override.
Error in .fun(.value[0], ...) : invalid 'type' (character) of argument

And tidyr::spread also results with an error:

tidyr::spread(df, key = id, value = feature)

Error: Each row of output must be identified by a unique combination of keys.
Keys are shared for 30 rows:

Any idea?

CodePudding user response:

I think you want to count the features and not sum them. Try with the function length.

tidyr::pivot_wider(df, names_from = feature, 
            values_from = feature, values_fn = length, values_fill = 0)

Or with dcast.

library(data.table)
dcast(setDT(df), id~feature, value.var = 'feature', fun.aggregate = length)

In base R, using table(df) would give the same output.

table(df)

#     feature
#id     f1 f2 f3 f4 f5
#  id1   1  0  1  1  0
#  id10  1  0  1  1  0
#  id2   1  1  0  0  1
#  id3   0  1  1  1  0
#  id4   1  0  1  0  1
#  id5   1  1  0  0  1
#  id6   1  1  1  0  0
#  id7   1  0  0  1  1
#  id8   1  1  0  0  1
#  id9   0  1  0  1  1
  • Related