Home > Mobile >  How to summarise values in a column with non-exact match in R?
How to summarise values in a column with non-exact match in R?

Time:10-11

I have a data.table with over ten thousand of rows. I want to count in one column how many times a variable appears, but I want to use non-exact match. The data looks like this:

dt1 <- data.table (place = c("a north", "a south", "b south", "a north", "c west", "b north", "c south", "a west", "b west"))

     place
1: a north
2: a south
3: b south
4: a north
5: c west
6: b north
7: c south
8: a west
9  b west

I just want to count how many times "a", "b" and "c" appears independent from the words that follows. I would like the result to look like this:

   a b c
1: 4 3 2

I tried summarise, charmath and pmatch, but they didn't work. Could anyone help?

CodePudding user response:

All depends on how varied the location could be and other scenario shapes.

You could separate the column into 2 and then group and count

dt1
separate(dt1, place, into = c('letter', 'direction')) %>%
  group_by(letter) %>%
  count() %>%
  pivot_wider(names_from = letter, values_from = n) 

CodePudding user response:

You can create a new column that contain only the string that you want with mutate() and substr() and then count the number of occurrences with count() like this.

library("data.table")
library("dplyr")

dt1 <- data.table(place = c("a north", "a south", "b south", "a north", "c west", "b north", "c south", "a west", "b west"))

dt1 |>
  mutate(first_letter = substr(place,1,1)) |>
  count(first_letter)

Output:

   first_letter n
1:            a 4
2:            b 3
3:            c 2

If you want a different match than that, you may need to use regex and case_when in your mutate.

CodePudding user response:

You can try a full data.table solution:

 dt1[,'.'(var = sub(" .*", "",place))
   ][,'.'(cnt = .N), by = var
   ][,data.table::transpose(.SD, make.names= 'var')]

   a b c
1: 4 3 2

CodePudding user response:

One simple full data.table solution:

library(data.table)

dt1[,lapply(.SD, substr,1,1)][,.N, by = place]
#>    place N
#> 1:     a 4
#> 2:     b 3
#> 3:     c 2

Created on 2021-10-11 by the reprex package (v2.0.1)

  • Related