Issue with R if else loop: Conditions only partly executed-CodePudding

I have the following data frame:

Row    Repro Number2
1      1     EWC
2     NA     LWY
3      7     EWS
4     NA     LWC
5     NA     EWC
6     NA     LWC
7      3     EWY
8     NA    LW2Y
9     NA Unknown
10    NA     LWC
11     1     EWC
12    NA     LWY
13    NA     EWY
14    NA     LWY
15    NA Unknown
16    NA     LWC

On this data frame, I am using the following loop:

for (i in 1:nrow(df3)) {
  if(df3$Number2[i 1]=="Unknown" & is.na(df3$Repro[i])) {
    df3$Number2[i]="Unknown"
  } else{
    df3$Number2[i]==df3$Number2[i]
  }
}

While the loop does run, I get an error code at the end and the data frame ends up not looking like the result I want.

My issue is that while the code is carrying out its intended purpose (replacing values in the number2 column with "Unknown" if the value after it is also "Unknown" and the associated Repro value is NA), it is only doing it with "Unknown" values that are initially in the datafreame. I want it to also take into account the new "Unknowns" added and carry out the loop conditions with those too.

Here is the error code:

Error in if (df3$Number2[i   1] == "Unknown" & is.na(df3$Repro[i])) { : 
  missing value where TRUE/FALSE needed

And here is the data frame after running the loop. I have added another column called "Number2.Correct" showing what I want the Number2 column to actually look like. The issue is with the rows 12 and 13 - These should be "Unknowns" and not "LWY" and "EWY", respectively.

   Repro Number2  Number2.Correct
1      1     EWC  EWC
2     NA     LWY  LWY
3      7     EWS  EWS
4     NA     LWC  LWC
5     NA     EWC  EWC
6     NA     LWC  LWC
7      3     EWY  EWY
8     NA Unknown  Unknown
9     NA Unknown  Unknown
10    NA     LWC  LWC
11     1     EWC  EWC
12    NA     LWY  Unknown
13    NA     EWY  Unknown 
14    NA Unknown  Unknown
15    NA Unknown  Unknown
16    NA     LWC  LEW

In the end, I have two questions:

How do I change my code to give me the result I want?
Why is the error code appearing and is it partly responsible for the issue?

CodePudding user response：

The reason that the code fails is because nrow(df3) 1 is out of range. so the for loop needs to be 1:(nrow(df3)-1)

To iteratively update Number2, one easy way (although not elegant) is to use while loop. The stopping condition is when new and old Number2 is the same.

while(T){
  df3$Number2_new <- df3$Number2
  for (i in 1:(nrow(df3)-1)) {
    if(df3$Number2_new[i 1]=="Unknown" & is.na(df3$Repro[i])) {
      df3$Number2_new[i]="Unknown"
    } else{
      df3$Number2_new[i]==df3$Number2_new[i]
    }
  }
  
  if(all(df3$Number2==df3$Number2_new)){
    df3 <- df3%>%
      mutate(Number2=Number2_new)%>%
      select(-Number2_new)
    break
  }else{
    df3 <- df3%>%
      mutate(Number2=Number2_new)%>%
      select(-Number2_new)
  }
}


df3

   Row Repro Number2
1    1     1     EWC
2    2    NA     LWY
3    3     7     EWS
4    4    NA     LWC
5    5    NA     EWC
6    6    NA     LWC
7    7     3     EWY
8    8    NA Unknown
9    9    NA Unknown
10  10    NA     LWC
11  11     1     EWC
12  12    NA Unknown
13  13    NA Unknown
14  14    NA Unknown
15  15    NA Unknown
16  16    NA     LWC

CodePudding user response：

for (i in rev(1:nrow(df3))) {
  if (df3$Number2[i   1] == "Unknown" & is.na(df3$Repro[i]) & i   1 < nrow(df3)) {
    df3$Number2[i] <- "Unknown"
  } else {
    df3$Number2[i] == df3$Number2[i]
  }
}

df3
#>    Row Repro Number2
#> 1    1     1     EWC
#> 2    2    NA     LWY
#> 3    3     7     EWS
#> 4    4    NA     LWC
#> 5    5    NA     EWC
#> 6    6    NA     LWC
#> 7    7     3     EWY
#> 8    8    NA Unknown
#> 9    9    NA Unknown
#> 10  10    NA     LWC
#> 11  11     1     EWC
#> 12  12    NA Unknown
#> 13  13    NA Unknown
#> 14  14    NA Unknown
#> 15  15    NA Unknown
#> 16  16    NA     LWC

^{Created on 2023-01-09 with reprex v2.0.2} You had two issues:

i 1 is out of range for the final row in your data; I added another condition (i 1 < nrow(df3))
The desired output you posted suggests that you want to look for Unknown from bottom to top, not top to bottom. You can reverse the order with rev()

CodePudding user response：

The i 1 goes out of range after the nrow of the data. We may use a group by approach with tidyverse

library(dplyr)
library(tidyr)
library(data.table)
 df3 %>%
  mutate(grp = replace(replace(Number2, Number2 != "Unknown", NA), 
    Number2 == "Unknown", seq_len(sum(Number2 == "Unknown")))) %>% 
  fill(grp, .direction = "updown") %>%
  group_by(grp, grp2 = rleid(is.na(Repro))) %>%
  mutate(Number2 = case_when(is.na(Repro) & 
    row_number() < match("Unknown", Number2) ~ "Unknown",
    TRUE ~ Number2)) %>%
  ungroup %>%
  select(-grp, -grp2)

-output

# A tibble: 16 × 3
     Row Repro Number2
   <int> <int> <chr>  
 1     1     1 EWC    
 2     2    NA LWY    
 3     3     7 EWS    
 4     4    NA LWC    
 5     5    NA EWC    
 6     6    NA LWC    
 7     7     3 EWY    
 8     8    NA Unknown
 9     9    NA Unknown
10    10    NA LWC    
11    11     1 EWC    
12    12    NA Unknown
13    13    NA Unknown
14    14    NA Unknown
15    15    NA Unknown
16    16    NA LWC

data

df3 <- structure(list(Row = 1:16, Repro = c(1L, NA, 7L, NA, NA, NA, 
3L, NA, NA, NA, 1L, NA, NA, NA, NA, NA), Number2 = c("EWC", "LWY", 
"EWS", "LWC", "EWC", "LWC", "EWY", "LW2Y", "Unknown", "LWC", 
"EWC", "LWY", "EWY", "LWY", "Unknown", "LWC")),
 class = "data.frame", row.names = c(NA, 
-16L))