I am trying to make a tally dataframe. My starting dataframe is looking like:
sample bike
1: 1 gazelle
2: 1 batavus
3: 2 cortina
4: 2 Cube
5: 3 Giant
And what I need is as follows:
sample gazelle batavus cortina Cube Giant
1: 1 1 1 0 0 0
2: 2 0 0 1 1 0
3: 3 0 0 0 0 1
So make a 1 if the variable is present in a sample and 0 if not.
I thought:
df %>% group_by(sample, bike) %>%
summarize(count = n(), .group = "drop" %>%
pivot_wider(names_from = "bike", values_from = "count", values_fill = 0)
but that did not do the trick.
CodePudding user response:
library(dplyr)
library(tidyr)
pivot_wider(
df,
names_from = bike, values_from = bike, values_fn = length, values_fill = 0L
)
# # A tibble: 3 × 6
# sample gazelle batavus cortina cube giant
# <int> <int> <int> <int> <int> <int>
# 1 1 1 1 0 0 0
# 2 2 0 0 1 1 0
# 3 3 0 0 0 0 1
Data
df = data.frame(
sample = c(1L,1L,2L,2L,3L),
bike = c("gazelle", "batavus", "cortina", "cube", "giant")
)
CodePudding user response:
We could also simply use table
to achieve a similar end. I.e.
table(df)
Output:
sample batavus cortina cube gazelle giant
1 1 0 0 1 0
2 0 1 1 0 0
3 0 0 0 0 1
CodePudding user response:
Create a value
column beforehand:
library(tidyr)
library(dplyr)
dat %>%
group_by(bike) %>%
mutate(value = n()) %>%
pivot_wider(names_from = bike, values_fill = 0)
# A tibble: 3 × 6
sample gazelle batavus cortina Cube Giant
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 0 0 0
2 2 0 0 1 1 0
3 3 0 0 0 0 1
CodePudding user response:
If you leave out the group
in the summarize()
statement, your code works.
library(tidyverse)
df1 %>% group_by(sample,bike) %>% summarize(count=n()) %>% pivot_wider(names_from="bike", values_from="count", values_fill=0)
`summarise()` has grouped output by 'sample'. You can override using the `.groups` argument.
# A tibble: 3 × 6
# Groups: sample [3]
sample batavus gazelle Cube cortina Giant
<dbl> <int> <int> <int> <int> <int>
1 1 1 1 0 0 0
2 2 0 0 1 1 0
3 3 0 0 0 0 1