Home > Software design >  How to create columns with row counts in R?
How to create columns with row counts in R?

Time:05-25

I have a dataset where I am trying to count the values across rows, and then create different columns for the counts for each possible variable. I have a large dataset like this (thousands of rows):

Date Lat Long Station Tow Sex
1990 y1 x1 1 1 1
1990 y1 x1 1 1 1
1990 y1 x1 1 1 2
1990 y1 x1 1 1 2
1990 y1 x1 1 1 2
1990 y2 x2 1 2 1
1990 y2 x2 1 2 1
1990 y2 x2 1 2 1
1990 y2 x2 1 2 1
1990 y2 x2 1 2 2

And I want to create a table like this in R

Date Lat Long Station Tow CountSex1 CountSex2
1990 y1 x1 1 1 2 3
1990 y1 x1 1 2 4 1

I tried using group_by and summarize in tidyverse, but wasn't getting what I needed. I'm still really new to tidyverse and I cannot figure out what functions/combination of functions to use to get quite what I'm looking for.

What I tried:

countsex <- df %>% 
  group_by(Date, Lat, Long, Station, Tow, Sex) %>% 
  summarise(count_sex=n())

And I got:

Date Lat Long Station Tow Sex CountSex
1990 y1 x1 1 1 1 2
1990 y1 x1 1 1 2 3
1990 y2 x2 1 2 1 4
1990 y2 x2 1 2 2 1

which isn't what I'm looking for, and I couldn't figure out how to transform this to get what I needed.

CodePudding user response:

We can use pivot_wider

library(dplyr)
library(tidyr)
df %>%
   pivot_wider(names_from = Sex, values_from = Sex, 
       values_fn = length, names_prefix = 'CountSex')

-output

# A tibble: 2 × 7
   Date Lat   Long  Station   Tow CountSex1 CountSex2
  <int> <chr> <chr>   <int> <int>     <int>     <int>
1  1990 y1    x1          1     1         2         3
2  1990 y2    x2          1     2         4         1

data

df <- structure(list(Date = c(1990L, 1990L, 1990L, 1990L, 1990L, 1990L, 
1990L, 1990L, 1990L, 1990L), Lat = c("y1", "y1", "y1", "y1", 
"y1", "y2", "y2", "y2", "y2", "y2"), Long = c("x1", "x1", "x1", 
"x1", "x1", "x2", "x2", "x2", "x2", "x2"), Station = c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Tow = c(1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L), Sex = c(1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 2L)), class = "data.frame", row.names = c(NA, -10L))

CodePudding user response:

Slightly different approach also using pivot_wider: Note Sex in count:

library(dplyr)
library(tidyr)

df %>% 
  count(Date, Lat, Long, Station, Tow,Sex = paste0("CountSex", Sex)) %>% 
  pivot_wider(names_from = Sex, values_from = n)
   Date Lat   Long  Station   Tow CountSex1 CountSex2
  <int> <chr> <chr>   <int> <int>     <int>     <int>
1  1990 y1    x1          1     1         2         3
2  1990 y2    x2          1     2         4         1
  • Related