Home > OS >  How to loop through each row looking for a change of string from "11" to any other number?
How to loop through each row looking for a change of string from "11" to any other number?

Time:09-07

here is an example of my data:

ID <- c(1,2,3,6) 
SOC_2017 <- c(22,11,11,10)
SOC_2018 <- c(11,11,21,10)
SOC_2019 <- c(11,11,21,20)
SOC_2020 <- c(21,11,21,20)
my_data<- data.frame(ID,SOC_2017,SOC_2018,SOC_2019,SOC_2020)

I need to loop through each row and look for the last time when the code 11 appears in each row. Then, I need to look for what is the code that appears after 11 and record it.

For instance, the person with ID 1 has SOC_2019 recorded as 11 and I want to record that the SOC code changed in 2020 (i.e., SOC_2020 is 21). I want to record this as showed in the table below:

ID SOC_2018 SOC_2020
1 NA 21
3 21 NA

This shows me the two instances where the SOC code went from 11 to something else (to code 21 in both these instances) and it records the year when the code changed.

I thought the IF function could probably be of help, but I am not sure on how to apply it here.

Any thoughts/tips appreciated!

Many thanks :)

CodePudding user response:

There might be an easier solution, but you could use

library(dplyr)
library(tidyr)

my_data %>% 
  pivot_longer(-ID) %>% 
  group_by(ID) %>% 
  mutate(tst = cumsum(lag(value, default = FALSE) == 11)) %>% 
  filter(tst == max(tst), tst != 0, value != 11) %>% 
  slice(1) %>% 
  ungroup() %>% 
  select(-tst) %>% 
  pivot_wider()

This returns

# A tibble: 2 x 3
     ID SOC_2020 SOC_2018
  <dbl>    <dbl>    <dbl>
1     1       21       NA
2     3       NA       21

CodePudding user response:

Slight variation; I get different output but perhaps I'm misunderstanding how you want to treat ID 3, which switches from 11 to 21 between '17 and '18.

my_data %>%
  pivot_longer(-ID) %>%
  group_by(ID) %>%
  mutate(flag = value == 11 & lead(value != 11, default = 0),
         flag = flag | lag(flag, default = FALSE)) %>%
  filter(flag) %>%
  ungroup() %>%
  select(-flag) %>%
  arrange(name) %>%
  pivot_wider() 

     ID SOC_2017 SOC_2018 SOC_2019 SOC_2020
  <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1     3       11       21       NA       NA
2     1       NA       NA       11       21
  • Related