Home > OS >  Calculate changes in columns of daily tidy data
Calculate changes in columns of daily tidy data

Time:12-27

Each day a company creates a value for category_1 and category_2. A new company may enter the survey midway as company E appears on Dec 25. Here are three days of data. So, two intervals: Dec 24-25 and Dec 25-26.

Question

For each category how many increase/decreases/no change were there over the 3 days? For example, in cat1 A goes from a 2 to 1, B goes from a 3 to a 4, etc.

By hand I get:

cat1 - Up: 2, Down: 5, No change: 2

cat2 - Up: 6, Down: 2, No change: 1

How do I calculate the number of up/downs/no changes in an R Script?

library("tidyverse")

d1 <- as.Date("2022-12-24")
d2 <- as.Date("2022-12-25")
d3 <- as.Date("2022-12-26")

df <- tibble(
  company = c(LETTERS[1:4], LETTERS[1:5], LETTERS[1:5]),
  cat1 = c(2, 3, 4, 5, 1, 4, 5, 3, 2, 1, 4, 4, 2, 1),
  cat2 = c(6, 7, 8, 9, 5, 5, 9, 10, 11, 6, 5, 10, 12, 13),
  date = c(rep(d1, 4), rep(d2, 5), rep(d2, 5))
)
df

CodePudding user response:

One approach using dplyr, assuming arranged data. Note: I changed the typo in date 3 to d3.

library(dplyr)

df %>% 
  group_by(company) %>% 
  mutate(cat1_change = cat1 - lag(cat1), cat2_change = cat2 - lag(cat2)) %>% 
  ungroup() %>% 
  summarize(type = c("up", "down", "no-change"), 
            across(ends_with("change"), ~ 
              c(sum(.x > 0, na.rm=T), sum(.x < 0, na.rm=T), sum(.x == 0, na.rm=T))))
# A tibble: 3 × 3
  type      cat1_change cat2_change
  <chr>           <int>       <int>
1 up                  2           6
2 down                5           2
3 no-change           2           1

Data

df <- structure(list(company = c("A", "B", "C", "D", "A", "B", "C", 
"D", "E", "A", "B", "C", "D", "E"), cat1 = c(2, 3, 4, 5, 1, 4, 
5, 3, 2, 1, 4, 4, 2, 1), cat2 = c(6, 7, 8, 9, 5, 5, 9, 10, 11, 
6, 5, 10, 12, 13), date = structure(c(19350, 19350, 19350, 19350, 
19351, 19351, 19351, 19351, 19351, 19352, 19352, 19352, 19352, 
19352), class = "Date")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -14L))

CodePudding user response:

An option with data.table - grouped by company, loop over the 'cat' column, get the diff of adjacent elements, convert to sign, and rename with factor labels, melt to long format and reshape back to 'wide' format with dcast

library(data.table)
dcast(melt(setDT(df)[, lapply(.SD, \(x) factor(sign(diff(x)), 
  levels = c(-1, 0, 1), labels = c("down", "no-change", "up"))),
   company, .SDcols = patterns("^cat")], id.var = "company", 
  value.name = "type"), type ~ paste0(variable, "_change"), length)

-output

        type cat1_change cat2_change
1:      down           5           2
2: no-change           2           1
3:        up           2           6
  •  Tags:  
  • r
  • Related