Home > Enterprise >  Checking index and handling missing values with R
Checking index and handling missing values with R

Time:11-24

I am trying to build vectors by checking the values of the data frame. I think I am running into issues checking for the NA condition. What I am trying to accomplish:

If index i at vectorA is not NA and index i at vectorB is also not NA then store those values in vectors xp and yp. Else if index i at vectorA is NA but index i at vectorB has a value (and vice versa) then store the values in vectors 3 and 4. When the loop is done I should have 4 vectors xp, yp with complete values. xu will store values where index i in vectorA was not empty but index i at vectorB was empty. yu will store values where index i in vectorA was empty but index i at vectorB was not empty. Essentially xp and yp are paired complete data while xu and yu are incomplete paired data.

In the code below I get the following error message, missing value where TRUE/FALSE needed.

xp = numeric()
yp = numeric()
xu = numeric()
yu = numeric()

m = length(df$Q15)
for( i in 1:m)

{
  
  if(df$Q15[i]!= NA & df$QA[i]!= NA) 
  
xp1[i]=df$Q15[i]
yp1[i]=df$QA[i]

}
  else{
  
If(df$Q15[i] != NA & df$QA[i] == NA) xu[i]=df$Q15[i]
If(df$Q15i] == NA & df$QA[i] != NA) yu[i]=df$QA[i]

}


Error in if (df$Q15[i] != NA & df$QA[i] != NA) xp1[i] = df$Q15[i] : 
  missing value where TRUE/FALSE needed

CodePudding user response:

Any operation with NA will result with NA, i.e. not TRUE / FALSE and that makes if() to complain. To test for NA values use is.na() :

123 * NA
#> [1] NA
NA == NA
#> [1] NA
NA != NA
#> [1] NA
NA == TRUE
#> [1] NA
NA == FALSE
#> [1] NA

is.na(NA)
#> [1] TRUE
!is.na(NA)
#> [1] FALSE

is.na(FALSE)
#> [1] FALSE
!is.na(FALSE)
#> [1] TRUE

Created on 2022-11-23 with reprex v2.0.2

CodePudding user response:

Here is one possible example:

df <- data.frame(QA = sample(c(0L,1L,NA_integer_), size = 15, replace = TRUE, prob = c(0.4,0.4,0.2)),
                 Q15= sample(c(0L,1L,NA_integer_), size = 15, replace = TRUE, prob = c(0.2,0.4,0.4)))

xp <- numeric()
yp <- numeric()
xu <- numeric()
yu <- numeric()

# don't do this
# m = length(df$Q15)

for( i in seq_along(df$QA)){
  
  ### use is.na() instead of == NA
  if( !is.na(df$Q15[[i]]) & !is.na(df$QA[[i]]) ){
    ### inserted missing brackets
    xp <- c(xp,df$Q15[[i]])
    yp <- c(yp,df$QA[[i]])
  }
  
  if( !is.na(df$Q15[[i]]) & is.na(df$QA[[i]]) )  xu <- c(xu,df$Q15[[i]])
  
  if( !is.na(df$QA[[i]])  & is.na(df$Q15[[i]]) ) yu <- c(yu,df$QA[[i]])
  
}

It includes some sample data and does your described procedure.

  •  Tags:  
  • r
  • Related