I have a data.table with over ten thousand of rows. I want to count in one column how many times a variable appears, but I want to use non-exact match. The data looks like this:
dt1 <- data.table (place = c("a north", "a south", "b south", "a north", "c west", "b north", "c south", "a west", "b west"))
place
1: a north
2: a south
3: b south
4: a north
5: c west
6: b north
7: c south
8: a west
9 b west
I just want to count how many times "a", "b" and "c" appears independent from the words that follows. I would like the result to look like this:
a b c
1: 4 3 2
I tried summarise, charmath and pmatch, but they didn't work. Could anyone help?
CodePudding user response:
All depends on how varied the location could be and other scenario shapes.
You could separate the column into 2 and then group and count
dt1
separate(dt1, place, into = c('letter', 'direction')) %>%
group_by(letter) %>%
count() %>%
pivot_wider(names_from = letter, values_from = n)
CodePudding user response:
You can create a new column that contain only the string that you want with mutate()
and substr()
and then count the number of occurrences with count()
like this.
library("data.table")
library("dplyr")
dt1 <- data.table(place = c("a north", "a south", "b south", "a north", "c west", "b north", "c south", "a west", "b west"))
dt1 |>
mutate(first_letter = substr(place,1,1)) |>
count(first_letter)
Output:
first_letter n
1: a 4
2: b 3
3: c 2
If you want a different match than that, you may need to use regex and case_when
in your mutate
.
CodePudding user response:
You can try a full data.table
solution:
dt1[,'.'(var = sub(" .*", "",place))
][,'.'(cnt = .N), by = var
][,data.table::transpose(.SD, make.names= 'var')]
a b c
1: 4 3 2
CodePudding user response:
One simple full data.table
solution:
library(data.table)
dt1[,lapply(.SD, substr,1,1)][,.N, by = place]
#> place N
#> 1: a 4
#> 2: b 3
#> 3: c 2
Created on 2021-10-11 by the reprex package (v2.0.1)