I am having trouble making a frequency table where the data is in multiple columns.My dataset is structured like this:
ID shape shape2 shape3 shape4 shape5
12 triangle rhombus circle square rectangle
68 hexagon NA NA NA NA
32 rectangle octagon square NA NA
18 square triangle NA NA NA
27. circle. NA. NA. NA. NA
77. square. trapezoid. triangle. NA. NA
I want to get a frequency table of all of the different shapes, like this:
shape. freq
triangle. 3
square. 4
circle. 2
hexagon. 1
octagon. 1
trapezoid. 1
rhombus. 1
rectangle 2
Then I really am only interested in 5 shapes, so I would like it to look like this:
shape. freq
triangle. 3
square. 4
circle. 2
rectangle 2
Thank you!
CodePudding user response:
Say you have your data in dat, then:
library(data.table)
data.table(table(gsub("\\.", "", unlist(dat[,-1]))))[N>=2]
Note that I removed the .
from some of the names. data.table
isn't really necessary, I just find the syntax [N>=2]
to be cleaner and more convenient than the base-r alternatives.
V1 N
1: circle 2
2: NA 4
3: rectangle 2
4: square 4
5: triangle 3
CodePudding user response:
Does this work:
library(dplyr)
library(tidyr)
df %>% pivot_longer(-ID, values_to = 'shape') %>% count(shape, name = 'freq') %>% filter(shape %in% c('triangle','square','circle','rectangle'))
# A tibble: 4 x 2
shape freq
<chr> <int>
1 circle 2
2 rectangle 2
3 square 4
4 triangle 3
CodePudding user response:
Another solution, based also on data.table
:
library(data.table)
dt <- data.table::data.table(
ID = c(12L, 68L, 32L, 18L, 27L, 77L),
shape = c("triangle","hexagon",
"rectangle","square","circle","square"),
shape2 = c("rhombus", NA, "octagon", "triangle", NA, "trapezoid"),
shape3 = c("circle", NA, "square", NA, NA, "triangle"),
shape4 = c("square", NA, NA, NA, NA, NA),
shape5 = c("rectangle", NA, NA, NA, NA, NA)
)
melt(dt, id="ID", 2:6, na.rm = T, value.name = "shape"
)[,.(freq = .N), by = "shape"][freq > 1]
#> shape freq
#> 1: triangle 3
#> 2: rectangle 2
#> 3: square 4
#> 4: circle 2