Home > front end >  make a tallying dataframe in R
make a tallying dataframe in R

Time:11-09

I am trying to make a tally dataframe. My starting dataframe is looking like:

   sample    bike
1:      1 gazelle
2:      1 batavus
3:      2 cortina
4:      2    Cube
5:      3   Giant

And what I need is as follows:

   sample gazelle batavus cortina Cube Giant
1:      1       1       1       0    0     0
2:      2       0       0       1    1     0
3:      3       0       0       0    0     1

So make a 1 if the variable is present in a sample and 0 if not.

I thought:

df %>% group_by(sample, bike) %>%
summarize(count = n(), .group = "drop" %>%
pivot_wider(names_from = "bike", values_from = "count", values_fill = 0)

but that did not do the trick.

CodePudding user response:

library(dplyr)
library(tidyr)

pivot_wider(
  df, 
  names_from = bike, values_from = bike, values_fn = length, values_fill = 0L
)

# # A tibble: 3 × 6
#   sample gazelle batavus cortina  cube giant
#    <int>   <int>   <int>   <int> <int> <int>
# 1      1       1       1       0     0     0
# 2      2       0       0       1     1     0
# 3      3       0       0       0     0     1

Data

df = data.frame(
  sample = c(1L,1L,2L,2L,3L),
  bike   = c("gazelle", "batavus", "cortina", "cube", "giant")
)

CodePudding user response:

We could also simply use table to achieve a similar end. I.e.

table(df)

Output:

sample batavus cortina cube gazelle giant
     1       1       0    0       1     0
     2       0       1    1       0     0
     3       0       0    0       0     1

CodePudding user response:

Create a value column beforehand:

library(tidyr)
library(dplyr)
dat %>% 
  group_by(bike) %>% 
  mutate(value = n()) %>% 
  pivot_wider(names_from = bike, values_fill = 0)

# A tibble: 3 × 6
  sample gazelle batavus cortina  Cube Giant
   <int>   <dbl>   <dbl>   <dbl> <dbl> <dbl>
1      1       1       1       0     0     0
2      2       0       0       1     1     0
3      3       0       0       0     0     1

CodePudding user response:

If you leave out the group in the summarize() statement, your code works.

library(tidyverse)

df1 %>% group_by(sample,bike) %>% summarize(count=n()) %>% pivot_wider(names_from="bike", values_from="count", values_fill=0)

`summarise()` has grouped output by 'sample'. You can override using the `.groups` argument.
# A tibble: 3 × 6
# Groups:   sample [3]
  sample batavus gazelle  Cube cortina Giant
   <dbl>   <int>   <int> <int>   <int> <int>
1      1       1       1     0       0     0
2      2       0       0     1       1     0
3      3       0       0     0       0     1
 

  •  Tags:  
  • r
  • Related