R: Count number of times B follows A using dplyr-CodePudding

I have a data.frame of monthly averages of radon measured over a few months. I have labeled each value either "below" or "above" a threshold and would like to count the number of times the average value does: "below to above", "above to below", "above to above" or "below to below".

df <- data.frame(value = c(130, 200, 240, 230, 130),
                 level = c("below", "above","above","above", "below"))

A bit of digging into Matlab answer on here suggests that we could use the Matrix package:

require(Matrix)
sparseMatrix(i=c(2,2,2,1), j=c(2,2,2))

Produces this result which I can't yet interpret.

[1,] | |
[2,] | .

Any thoughts about a tidyverse method?

CodePudding user response：

You could use table from base R:

table(df$level[-1], df$level[-nrow(df)])

        above below
  above     2     1
  below     1     0

CodePudding user response：

Sure, just use group by and count the values

library(dplyr)

df <- data.frame(value = c(130, 200, 240, 230, 130),
                 level = c("below", "above","above","above", "below"))
df %>% 
  group_by(grp = paste(level, lead(level))) %>% 
  summarise(n = n()) %>% 
  # drop the observation that does not have a "next" value
  filter(!grepl(pattern = "NA", x = grp))
#> # A tibble: 3 × 2
#>   grp             n
#>   <chr>       <int>
#> 1 above above     2
#> 2 above below     1
#> 3 below above     1

CodePudding user response：

not run, so there may be a typo, but you get the idea. I'll leave it to you to deal with na and the first obs. Single pass through the vector.

library(dplyr)
summarize(increase = sum(case_when(value > lag(value) ~ 1, T ~ 0)),
          decrease = sum(case_when(value > lag(value) ~ 1, T ~ 0)),
          constant = sum(case_when(value = lag(value) ~ 1, T ~ 0))
         )

CodePudding user response：

A slightly different version:

library(dplyr)
library(stringr)
df %>% 
  group_by(level = str_c(level, lead(level), sep = " ")) %>% 
  count(level) %>% 
  na.omit()

 level           n
  <chr>       <int>
1 above above     2
2 above below     1
3 below above     1

CodePudding user response：

Another possible solution, based on tidyverse:

library(tidyverse)

df<-data.frame(value=c(130,200, 240, 230, 130),level=c("below", "above","above","above", "below"))

df %>% 
  mutate(changes = str_c(lag(level), level, sep = "_")) %>% 
  count(changes) %>% drop_na(changes)

#>       changes n
#> 1 above_above 2
#> 2 above_below 1
#> 3 below_above 1

Yet another solution, based on data.table:

library(data.table)

dt<-data.table(value=c(130,200, 240, 230, 130),level=c("below", "above","above","above", "below"))

dt[, changes := paste(shift(level), level, sep = "_")
][2:.N][,.(n = .N), keyby = .(changes)]

#>        changes n
#> 1: above_above 2
#> 2: above_below 1
#> 3: below_above 1