Home > front end >  Creating loop to compare rows and create new variable
Creating loop to compare rows and create new variable

Time:10-23

I have a long file that has observations along each row. Here are the variables:

  • id = Each unique ID represents a person, and each person has three observations.
  • type = Each observation is coded as a type represented by a code (207, 208, 212, 359)
  • date = Each observation occurred on a date
  • order_num = This represents the sequence in which the observations occurred.

I want to use a loop to create a new variable that places the type when order_num == 1 in each observation within each individual.

Here is what I have:

# A tibble: 9 x 4
     id  type date       order_num
  <dbl> <dbl> <chr>          <dbl>
1     1   212 2020-12-15         1
2     1   207 2021-01-21         2
3     1   208 2021-02-21         3
4     2   207 2020-12-31         1
5     2   208 2021-01-30         2
6     2   212 2021-02-28         3
7     3   208 2021-04-01         1
8     3   212 2021-05-01         2
9     3   359 2021-06-01         3

Here is what I want:

     id  type date       order_num first
  <dbl> <dbl> <chr>          <dbl> <dbl>
1     1   212 2020-12-15         1   212
2     1   207 2021-01-21         2   212
3     1   208 2021-02-21         3   212
4     2   207 2020-12-31         1   207
5     2   208 2021-01-30         2   207
6     2   212 2021-02-28         3   207
7     3   208 2021-04-01         1   208
8     3   212 2021-05-01         2   208
9     3   359 2021-06-01         3   208

As you can see for id 1, the first type that they received in the sequence was 212. The "first" variable now has type 212 in all three of the observations for id 1.

Here are the data that I'm starting with:

  ~id,~type,~date,       ~order_num,
  #---|----|------------|-----------|
  1,   212, "2020-12-15", 1, 
  1,   207, "2021-01-21", 2, 
  1,   208, "2021-02-21", 3, 
  2,   207, "2020-12-31", 1,
  2,   208, "2021-01-30", 2,
  2,   212, "2021-02-28", 3,
  3,   208, "2021-04-01", 1,
  3,   212, "2021-05-01", 2,
  3,   359, "2021-06-01", 3)

Unfortunately, when I run the loop that I developed (see below), I cannot get the result that I'm seeking (see below).

MY CODE:

  if(df$id[i] == df$id[i-1]){
    df$first <- df$type[i-1]
  } else if (df$id[i] != df$id[i-1]) {
    df$first <- df$type[i]
  } else df$first <- 99
}

GENERATED BY MY CODE:

# A tibble: 9 x 5
     id  type date       order_num first
  <dbl> <dbl> <chr>          <dbl> <dbl>
1     1   212 2020-12-15         1   212
2     1   207 2021-01-21         2   212
3     1   208 2021-02-21         3   212
4     2   207 2020-12-31         1   212
5     2   208 2021-01-30         2   212
6     2   212 2021-02-28         3   212
7     3   208 2021-04-01         1   212
8     3   212 2021-05-01         2   212
9     3   359 2021-06-01         3   212

I know multiple things wrong because the same type is being assigned to every observation.

Does anyone have any suggestions for how I could get the result I'm looking for?

Thanks.

CodePudding user response:

Here is the solution on a bit smaller dataset:

library(dplyr)                                                                                                                                                                                                                                                                                                                                                                                                                                                            "collector")), `02:30` = structure(list(), class = c("collector_double", 
df <- data_frame(id = c(1,1,2,2,3,3),
                 type = c(212,207,207,208,208,305),
                 order_num = c(1,2,1,2,1,2))

df <- df %>% group_by(id) %>% mutate(first= type[order_num==1])

CodePudding user response:

The reason you end up with all the same numbers in the first column is because you forgot to index the row numbers. Here is the fix for your code:

for(i in 2:nrow(df)){ #start at index 2 because we are comparing backwards 1 each time
  if(df$id[i] == df$id[i-1]){
    df$first[i] <- df$type[i-1] #this assumes order_num is always in order; be careful
  } else if (df$id[i] != df$id[i-1]) {
    df$first[i] <- df$type[i]
  } else df$first[i] <- 99
}

This solution may work for you. Others have suggested using dplyr as an alternative. To avoid unnecessary external packages, I would suggest the following solution using base R:

id_list <- unique(df$id)
for(i in 1:length(id_list)){
  df[df$id == id_list[i], "first"] <- df[df$id == id_list[i] & df$order_num == 1, "type"]
}

Edit: I just thought of an even simpler solution using only base R:

for(i in 1:nrow(df)){
  df[i, "first"] <- df[df$id == df$id[i] & df$order_num == 1, "type"]
}
  • Related