Home > database >  r - How to track Changes in Rows of dataframe with characters?
r - How to track Changes in Rows of dataframe with characters?

Time:07-12

Additional to my last question, I am now looking for a way to track changes within a data frame of characters.

Suppose I have the following dataframe df:

df=data.frame(ID=c(123100,123200,123300,123400,123500),"2014"=c("Germany","Germany","Germany","Italy","Austria"),"2015"=c("Germany","Germany","Germany","Italy","Austria"),"2016"=c("Italy","Germany","Germany","Italy","Germany"), "2017"=c("Italy","Germany","Germany","Italy","Germany"), "2018"=c("Italy","Austria","Germany","Italy","Germany") )

Now, I want to find out, for which ID the data has changed in which year. So for example, in 2016 ID 123100 has changed from Germany to Italy. I would like to add new columns for change (1 = change, 0 or NA = no change), year of change, old expression and new expression. The fact, that the real dataset consists of thousands of different expressions instead of the three countries is a challenge for me. I need a solution without the need to determine the different expressions before.

In the end it should look like this:

df_final=data.frame(ID=c(123100,123200,123300,123400,123500),"2014"=c("Germany","Germany","Germany","Italy","Austria"),"2015"=c("Germany","Germany","Germany","Italy","Austria"),"2016"=c("Italy","Germany","Germany","Italy","Germany"), "2017"=c("Italy","Germany","Germany","Italy","Germany"), "2018"=c("Italy","Austria","Germany","Italy","Germany"), "change"=c(1,1,0,0,1),
                "year"=c(2016, 2018, 0, 0, 2016), "before"=c("Germany","Germany",0,0,"Austria"), "after"=c("Italy", "Austria", 0, 0, "Germany"))

I couldn't find any satisfying solution on here, so I hope you can help me.

CodePudding user response:

Not elegant, but you can use rle to count the lengths and values in a vector. I'd used plyr::ldply to run rle for each row.

library(plyr)
output <- ldply(seq_len(nrow(df)), function(x){
  columns <- c("X2014", "X2015", "X2016", "X2017", "X2018")
  rle_output <- rle(df[x, columns])
  if(length(rle_output$lengths) == 1) return(data.frame(change=0))
  else{
    change = 1
    year = columns[rle_output$lengths[2]]
    before = unlist(rle_output$values[1])
    after = unlist(rle_output$values[2])
    return(data.frame(change, year, before, after))
  }})

cbind(df, output)

      ID   X2014   X2015   X2016   X2017   X2018 change  year  before   after
1 123100 Germany Germany   Italy   Italy   Italy      1 X2016 Germany   Italy
2 123200 Germany Germany Germany Germany Austria      1 X2014 Germany Germany
3 123300 Germany Germany Germany Germany Germany      0  <NA>    <NA>    <NA>
4 123400   Italy   Italy   Italy   Italy   Italy      0  <NA>    <NA>    <NA>
5 123500 Austria Austria Germany Germany Germany      1 X2016 Austria Germany

CodePudding user response:

Try this

df |> rowwise() |> mutate(change = case_when(all(c_across(X2015:X2018) == X2014) ~ 0 , TRUE ~ 1) ,
year = colnames(df)[-1][which(c_across(X2014) != c_across(X2014:X2018))[1]] ) |>
ungroup() |> mutate(before = ifelse(change == 1 , X2014 ,NA) ,
after = ifelse(change == 1 , X2018 ,NA))
  • output
# A tibble: 5 × 10
      ID X2014   X2015   X2016   X2017   X2018   change year  before  after  
   <dbl> <chr>   <chr>   <chr>   <chr>   <chr>    <dbl> <chr> <chr>   <chr>  
1 123100 Germany Germany Italy   Italy   Italy        1 X2016 Germany Italy  
2 123200 Germany Germany Germany Germany Austria      1 X2018 Germany Austria
3 123300 Germany Germany Germany Germany Germany      0 NA    NA      NA     
4 123400 Italy   Italy   Italy   Italy   Italy        0 NA    NA      NA     
5 123500 Austria Austria Germany Germany Germany      1 X2016 Austria Germany
>
  • Related