Additional to my last question, I am now looking for a way to track changes within a data frame of characters.
Suppose I have the following dataframe df:
df=data.frame(ID=c(123100,123200,123300,123400,123500),"2014"=c("Germany","Germany","Germany","Italy","Austria"),"2015"=c("Germany","Germany","Germany","Italy","Austria"),"2016"=c("Italy","Germany","Germany","Italy","Germany"), "2017"=c("Italy","Germany","Germany","Italy","Germany"), "2018"=c("Italy","Austria","Germany","Italy","Germany") )
Now, I want to find out, for which ID the data has changed in which year. So for example, in 2016 ID 123100 has changed from Germany to Italy. I would like to add new columns for change (1 = change, 0 or NA = no change), year of change, old expression and new expression. The fact, that the real dataset consists of thousands of different expressions instead of the three countries is a challenge for me. I need a solution without the need to determine the different expressions before.
In the end it should look like this:
df_final=data.frame(ID=c(123100,123200,123300,123400,123500),"2014"=c("Germany","Germany","Germany","Italy","Austria"),"2015"=c("Germany","Germany","Germany","Italy","Austria"),"2016"=c("Italy","Germany","Germany","Italy","Germany"), "2017"=c("Italy","Germany","Germany","Italy","Germany"), "2018"=c("Italy","Austria","Germany","Italy","Germany"), "change"=c(1,1,0,0,1),
"year"=c(2016, 2018, 0, 0, 2016), "before"=c("Germany","Germany",0,0,"Austria"), "after"=c("Italy", "Austria", 0, 0, "Germany"))
I couldn't find any satisfying solution on here, so I hope you can help me.
CodePudding user response:
Not elegant, but you can use rle
to count the lengths and values in a vector. I'd used plyr::ldply
to run rle
for each row.
library(plyr)
output <- ldply(seq_len(nrow(df)), function(x){
columns <- c("X2014", "X2015", "X2016", "X2017", "X2018")
rle_output <- rle(df[x, columns])
if(length(rle_output$lengths) == 1) return(data.frame(change=0))
else{
change = 1
year = columns[rle_output$lengths[2]]
before = unlist(rle_output$values[1])
after = unlist(rle_output$values[2])
return(data.frame(change, year, before, after))
}})
cbind(df, output)
ID X2014 X2015 X2016 X2017 X2018 change year before after
1 123100 Germany Germany Italy Italy Italy 1 X2016 Germany Italy
2 123200 Germany Germany Germany Germany Austria 1 X2014 Germany Germany
3 123300 Germany Germany Germany Germany Germany 0 <NA> <NA> <NA>
4 123400 Italy Italy Italy Italy Italy 0 <NA> <NA> <NA>
5 123500 Austria Austria Germany Germany Germany 1 X2016 Austria Germany
CodePudding user response:
Try this
df |> rowwise() |> mutate(change = case_when(all(c_across(X2015:X2018) == X2014) ~ 0 , TRUE ~ 1) ,
year = colnames(df)[-1][which(c_across(X2014) != c_across(X2014:X2018))[1]] ) |>
ungroup() |> mutate(before = ifelse(change == 1 , X2014 ,NA) ,
after = ifelse(change == 1 , X2018 ,NA))
- output
# A tibble: 5 × 10
ID X2014 X2015 X2016 X2017 X2018 change year before after
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
1 123100 Germany Germany Italy Italy Italy 1 X2016 Germany Italy
2 123200 Germany Germany Germany Germany Austria 1 X2018 Germany Austria
3 123300 Germany Germany Germany Germany Germany 0 NA NA NA
4 123400 Italy Italy Italy Italy Italy 0 NA NA NA
5 123500 Austria Austria Germany Germany Germany 1 X2016 Austria Germany
>