Home > Mobile >  Obtain column values based on changes in another column
Obtain column values based on changes in another column

Time:04-29

I have a dataframe like this:

dateColumnDF <- c("2022-04-12 00:02:57", "2022-04-12 00:02:58", "2022-04-12 00:02:59", "2022-04-12 00:03:00", "2022-04-12 00:03:02")
ValueColumnDf <- c("50","465","788","99","25")
Var1Df <- c("0", "0", "0","0","0")
Var2Df <- c("0", "0", "1","1","0")
Var3Df <- c("0","1","0","1","0")
df <- data.frame(dateColumnDF, ValueColumnDf,Var1Df,Var2Df,Var3Df)
colnames(df) <- c("timestamp","Value","Var1","Var2","Var3")

And I want to obtain a dataframe that reflects the way the Varx values have been changing, from one value to another (mainly from 0 to 1 and back), like this:

firstColumn <- c("Var2", "Var2", "Var3", "Var3", "Var3","Var3")
secondColumn <- c("1", "0", "1","0","1","0")
thirdColumn <- c("2022-04-12 00:02:59", "2022-04-12 00:03:02", "2022-04-12 00:02:58","  
2022-04-12 00:02:59","2022-04-12 00:03:00","2022-04-12 00:03:02")
fourthColum <- c("788","25","465","788","99","25")
df2 <- data.frame(firstColumn,secondColumn,thirdColumn,fourthColum)
colnames(df2) <- c("Var","flagChangedTo","timestamp","Value")

I have found that to see the changes in each row I need to to do (using dplyr):

which(df$value!= dplyr::lag(df$value))

And that I would need to put this inside a loop of the columns of interest, something like this:

for(i in 3:ncol(df)) {
  x <- which(df[,i]!= dplyr::lag(df[,i]))
}

After I obtain the places where it changes how can I generate the desired matrix?

CodePudding user response:

Using tidyr::pivot_longer (you were already using dplyr), you can turn the table into long form.

library(tidyr)
pivot_longer(df, starts_with("Var"), names_to = "Var", values_to = "flagChangedTo")

This gives

   timestamp           Value Var   flagChangedTo
   <fct>               <fct> <chr> <fct>        
 1 2022-04-12 00:02:57 50    Var1  0            
 2 2022-04-12 00:02:57 50    Var2  0            
 3 2022-04-12 00:02:57 50    Var3  0            
 4 2022-04-12 00:02:58 465   Var1  0            
 5 2022-04-12 00:02:58 465   Var2  0            
...

After that, we can group by Var and use filter to keep only the rows in which the flagChangedTo value of the previous row is not equal to the current row, using dplyr::lag (like you already correctly suggested).

Put together, and using the magrittr pipe (%>%):

library(tidyr)
df %>% 
  pivot_longer(starts_with("Var"), names_to = "Var", values_to = "flagChangedTo") %>% 
  group_by(Var) %>% 
  arrange(timestamp) %>% 
  filter(flagChangedTo != lag(flagChangedTo)) %>% 
  ungroup() %>% 
  arrange(Var, timestamp)

Which gives

  timestamp           Value Var   flagChangedTo
  <fct>               <fct> <chr> <fct>        
1 2022-04-12 00:02:59 788   Var2  1            
2 2022-04-12 00:03:02 25    Var2  0            
3 2022-04-12 00:02:58 465   Var3  1            
4 2022-04-12 00:02:59 788   Var3  0            
5 2022-04-12 00:03:00 99    Var3  1            
6 2022-04-12 00:03:02 25    Var3  0            

CodePudding user response:

Similar to @Bas response:

df %>% pivot_longer(cols=Var1:Var3) %>% 
    arrange(name, timestamp) %>% 
    group_by(name) %>% 
    filter(value!=lag(value)) %>% 
    select(Var=name, flagChangedTo=value,timestamp,Value)

Output:

  Var   flagChangedTo timestamp           Value
  <chr> <chr>         <chr>               <chr>
1 Var2  1             2022-04-12 00:02:59 788  
2 Var2  0             2022-04-12 00:03:02 25   
3 Var3  1             2022-04-12 00:02:58 465  
4 Var3  0             2022-04-12 00:02:59 788  
5 Var3  1             2022-04-12 00:03:00 99   
6 Var3  0             2022-04-12 00:03:02 25  
  • Related