I have a dataset where I am trying to count the values across rows, and then create different columns for the counts for each possible variable. I have a large dataset like this (thousands of rows):
Date | Lat | Long | Station | Tow | Sex |
---|---|---|---|---|---|
1990 | y1 | x1 | 1 | 1 | 1 |
1990 | y1 | x1 | 1 | 1 | 1 |
1990 | y1 | x1 | 1 | 1 | 2 |
1990 | y1 | x1 | 1 | 1 | 2 |
1990 | y1 | x1 | 1 | 1 | 2 |
1990 | y2 | x2 | 1 | 2 | 1 |
1990 | y2 | x2 | 1 | 2 | 1 |
1990 | y2 | x2 | 1 | 2 | 1 |
1990 | y2 | x2 | 1 | 2 | 1 |
1990 | y2 | x2 | 1 | 2 | 2 |
And I want to create a table like this in R
Date | Lat | Long | Station | Tow | CountSex1 | CountSex2 |
---|---|---|---|---|---|---|
1990 | y1 | x1 | 1 | 1 | 2 | 3 |
1990 | y1 | x1 | 1 | 2 | 4 | 1 |
I tried using group_by and summarize in tidyverse, but wasn't getting what I needed. I'm still really new to tidyverse and I cannot figure out what functions/combination of functions to use to get quite what I'm looking for.
What I tried:
countsex <- df %>%
group_by(Date, Lat, Long, Station, Tow, Sex) %>%
summarise(count_sex=n())
And I got:
Date | Lat | Long | Station | Tow | Sex | CountSex |
---|---|---|---|---|---|---|
1990 | y1 | x1 | 1 | 1 | 1 | 2 |
1990 | y1 | x1 | 1 | 1 | 2 | 3 |
1990 | y2 | x2 | 1 | 2 | 1 | 4 |
1990 | y2 | x2 | 1 | 2 | 2 | 1 |
which isn't what I'm looking for, and I couldn't figure out how to transform this to get what I needed.
CodePudding user response:
We can use pivot_wider
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = Sex, values_from = Sex,
values_fn = length, names_prefix = 'CountSex')
-output
# A tibble: 2 × 7
Date Lat Long Station Tow CountSex1 CountSex2
<int> <chr> <chr> <int> <int> <int> <int>
1 1990 y1 x1 1 1 2 3
2 1990 y2 x2 1 2 4 1
data
df <- structure(list(Date = c(1990L, 1990L, 1990L, 1990L, 1990L, 1990L,
1990L, 1990L, 1990L, 1990L), Lat = c("y1", "y1", "y1", "y1",
"y1", "y2", "y2", "y2", "y2", "y2"), Long = c("x1", "x1", "x1",
"x1", "x1", "x2", "x2", "x2", "x2", "x2"), Station = c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Tow = c(1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), Sex = c(1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 2L)), class = "data.frame", row.names = c(NA, -10L))
CodePudding user response:
Slightly different approach also using pivot_wider
: Note Sex
in count
:
library(dplyr)
library(tidyr)
df %>%
count(Date, Lat, Long, Station, Tow,Sex = paste0("CountSex", Sex)) %>%
pivot_wider(names_from = Sex, values_from = n)
Date Lat Long Station Tow CountSex1 CountSex2
<int> <chr> <chr> <int> <int> <int> <int>
1 1990 y1 x1 1 1 2 3
2 1990 y2 x2 1 2 4 1