I have a dataframe like this:
dateColumnDF <- c("2022-04-12 00:02:57", "2022-04-12 00:02:58", "2022-04-12 00:02:59", "2022-04-12 00:03:00", "2022-04-12 00:03:02")
ValueColumnDf <- c("50","465","788","99","25")
Var1Df <- c("0", "0", "0","0","0")
Var2Df <- c("0", "0", "1","1","0")
Var3Df <- c("0","1","0","1","0")
df <- data.frame(dateColumnDF, ValueColumnDf,Var1Df,Var2Df,Var3Df)
colnames(df) <- c("timestamp","Value","Var1","Var2","Var3")
And I want to obtain a dataframe that reflects the way the Varx values have been changing, from one value to another (mainly from 0 to 1 and back), like this:
firstColumn <- c("Var2", "Var2", "Var3", "Var3", "Var3","Var3")
secondColumn <- c("1", "0", "1","0","1","0")
thirdColumn <- c("2022-04-12 00:02:59", "2022-04-12 00:03:02", "2022-04-12 00:02:58","
2022-04-12 00:02:59","2022-04-12 00:03:00","2022-04-12 00:03:02")
fourthColum <- c("788","25","465","788","99","25")
df2 <- data.frame(firstColumn,secondColumn,thirdColumn,fourthColum)
colnames(df2) <- c("Var","flagChangedTo","timestamp","Value")
I have found that to see the changes in each row I need to to do (using dplyr):
which(df$value!= dplyr::lag(df$value))
And that I would need to put this inside a loop of the columns of interest, something like this:
for(i in 3:ncol(df)) {
x <- which(df[,i]!= dplyr::lag(df[,i]))
}
After I obtain the places where it changes how can I generate the desired matrix?
CodePudding user response:
Using tidyr::pivot_longer
(you were already using dplyr
), you can turn the table into long form.
library(tidyr)
pivot_longer(df, starts_with("Var"), names_to = "Var", values_to = "flagChangedTo")
This gives
timestamp Value Var flagChangedTo
<fct> <fct> <chr> <fct>
1 2022-04-12 00:02:57 50 Var1 0
2 2022-04-12 00:02:57 50 Var2 0
3 2022-04-12 00:02:57 50 Var3 0
4 2022-04-12 00:02:58 465 Var1 0
5 2022-04-12 00:02:58 465 Var2 0
...
After that, we can group by Var
and use filter
to keep only the rows in which the flagChangedTo
value of the previous row is not equal to the current row, using dplyr::lag
(like you already correctly suggested).
Put together, and using the magrittr pipe (%>%
):
library(tidyr)
df %>%
pivot_longer(starts_with("Var"), names_to = "Var", values_to = "flagChangedTo") %>%
group_by(Var) %>%
arrange(timestamp) %>%
filter(flagChangedTo != lag(flagChangedTo)) %>%
ungroup() %>%
arrange(Var, timestamp)
Which gives
timestamp Value Var flagChangedTo
<fct> <fct> <chr> <fct>
1 2022-04-12 00:02:59 788 Var2 1
2 2022-04-12 00:03:02 25 Var2 0
3 2022-04-12 00:02:58 465 Var3 1
4 2022-04-12 00:02:59 788 Var3 0
5 2022-04-12 00:03:00 99 Var3 1
6 2022-04-12 00:03:02 25 Var3 0
CodePudding user response:
Similar to @Bas response:
df %>% pivot_longer(cols=Var1:Var3) %>%
arrange(name, timestamp) %>%
group_by(name) %>%
filter(value!=lag(value)) %>%
select(Var=name, flagChangedTo=value,timestamp,Value)
Output:
Var flagChangedTo timestamp Value
<chr> <chr> <chr> <chr>
1 Var2 1 2022-04-12 00:02:59 788
2 Var2 0 2022-04-12 00:03:02 25
3 Var3 1 2022-04-12 00:02:58 465
4 Var3 0 2022-04-12 00:02:59 788
5 Var3 1 2022-04-12 00:03:00 99
6 Var3 0 2022-04-12 00:03:02 25