create new data frame based on variables conditions of other data frame-CodePudding

I am trying to create a new data frame using R from a larger data frame. This is a short version of my large data frame:

 df <- data.frame(time = c(0,1,5,10,12,13,20,22,25,30,32,35,39),
             t_0_1 = c(20,20,20,120,300,350,400,600,700,100,20,20,20),
             t_0_2 = c(20,20,20,20,120,300,350,400,600,700,100,20,20),
             t_2_1 = c(20,20,20,20,20,120,300,350,400,600,700,100,20),
             t_2_2 = c(20,20,20,20,120,300,350,400,600,700,100,20,20))

The new data frame should have the first variable values as the number in the end of the large data frame variables name (1 and 2). The other variables name should be the number in the middle of the large data frame variables (0 and 2) and for their values I am trying to filter the values greater than 300 for each variable and calculate the time difference. For example for variable "t_0_1", the time that the values are greater than 300 is 13 to 25 seconds. So the value in the new data frame should be 12.

The new data frame should look like this:

df_new <- data.frame(height= c(1,2),
                     "0" = c(12,10),
                     "2" = c(10,10))

Any help where I should start or how I can do that is very welcome. Thank you!!

CodePudding user response：

Here is a tidyverse solution

library(tidyverse)
df %>%
    pivot_longer(-time) %>%
    separate(name, c(NA, "col", "height"), sep = "_") %>%
    pivot_wider(names_from = "col", names_prefix = "X") %>%
    group_by(height) %>%
    summarise(
        across(starts_with("X"), ~ sum(diff(time[.x > 300]))), 
        .groups = "drop")
## A tibble: 2 x 3
#  height    X0    X2
#  <chr>  <dbl> <dbl>
#1 1         12    10
#2 2         10    10

Explanation: The idea is to reshape from wide to long, separate the column names into a (future) column name "col" and a "height". Reshape from long to wide by taking column names from "col" (prefixing with "X") and the summarising according to your requirements (i.e. keep only those entries where the value is > 300, and sum the difference in time).

CodePudding user response：

You could calculate the time difference for each column with summarise(across(...)), and then transform the data to long.

library(tidyverse)

df %>%
  summarise(across(-time, ~ sum(diff(time[.x > 300])))) %>%
  pivot_longer(everything(), names_to = c(".value", "height"), names_pattern = "t_(. )_(. )")

# # A tibble: 2 × 3
#   height   `0`   `2`
#   <chr>  <dbl> <dbl>
# 1 1         12    10
# 2 2         10    10