Home > Blockchain >  Using NA in an If/Then statement
Using NA in an If/Then statement

Time:06-10

I want to create a new column called Egg_Number. If the row (every row is a specific nest) has no NA values then there are three eggs in the nest. If the row has an NA value for length_3, then there are two eggs. If the row has an NA value in length_3 AND length_2, then there is only 1 egg in the nest. I'm trying to figure out how to add this column and my only idea was using an if/else statement.

Something like so:

if (NIUS2021.ALL$length_3 = NA) { 
NIUS2021.ALL$Egg_Number = 2 }
else if (NIUS2021.ALL$length_3 = NA AND NIUS2021.ALL$length_2 = NA) {
NIUS2021.ALL$Egg_Number = 1 }
else {NIUS2021$.ALLEgg_Number = 3}

Here is my data set I'm using

> dput(head(NIUS2021.ALL))
structure(list(Niu = structure(1:6, .Label = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "21", "22", "23", "25", "26", "27", "28", 
"29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", 
"40", "41", "42", "43", "44", "45", "46", "47", "48", "49", "50", 
"51", "52", "53", "54", "55", "56", "57", "58", "59", "60", "61", 
"62", "63", "64", "65", "66", "67", "68", "69", "70", "71", "72", 
"73", "74", "75", "76", "906", "910", "915", "916", "917", "919", 
"920", "922", "924", "927", "928", "930", "931", "950", "951", 
"952", "953", "954", "955", "956", "957", "958", "959", "963"
), class = "factor"), totalV = c(183.1311069223, 189.09433326461, 
196.39045370996, 181.282560178575, 115.67490871467, 172.752941083985
), averageV = c(61.0437023074333, 63.0314444215367, 65.4634845699867, 
60.427520059525, 57.837454357335, 57.5843136946617), volume_1 = c(62.57592810342, 
67.75331569111, 71.51507045914, 62.28569026831, 58.639530945905, 
59.458291114465), volume_2 = c(60.45794961088, 64.6494663195, 
62.626512390435, 62.029110556805, 57.035377768765, 58.42423975
), volume_3 = c(60.097229208, 56.691551254, 62.248870860385, 
56.96775935346, NA, 54.87041021952), length_1 = c(67.07, 66.86, 
66.44, 64.94, 68.77, 63.41), length_2 = c(62.78, 63.27, 65.59, 
63.13, 62.81, 65.15), length_3 = c(68.28, 62.75, 63.41, 67.29, 
NA, 61.53), width_1 = c(43.86, 45.71, 47.11, 44.47, 41.93, 43.97
), width_2 = c(44.56, 45.9, 44.37, 45.01, 43.27, 43), width_3 = c(42.6, 
43.16, 44.99, 41.78, NA, 42.88)), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
    Niu = structure(1:6, .Label = c("1", "2", "3", "4", "5", 
    "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", 
    "17", "18", "19", "21", "22", "23", "25", "26", "27", "28", 
    "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", 
    "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", 
    "49", "50", "51", "52", "53", "54", "55", "56", "57", "58", 
    "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", 
    "69", "70", "71", "72", "73", "74", "75", "76", "906", "910", 
    "915", "916", "917", "919", "920", "922", "924", "927", "928", 
    "930", "931", "950", "951", "952", "953", "954", "955", "956", 
    "957", "958", "959", "963"), class = "factor"), .rows = structure(list(
        1L, 2L, 3L, 4L, 5L, 6L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), .drop = TRUE))

Thank you for the help!

CodePudding user response:

You may try

library(dplyr)

NIUS2021 %>%
  mutate(Egg_number = case_when(
    !is.na(length_1 * length_2 * length_3) ~ 3,
    is.na(length_3) & is.na(length_2) ~ 1,
    is.na(length_3) ~ 2,
    T ~ NA_real_
  ))

some of data

  Niu   length_1 length_2 length_3 Egg_number     key
  <fct>    <dbl>    <dbl>    <dbl>      <dbl>   <dbl>
1 1         67.1     62.8     68.3          3 282409.
2 2         66.9     63.3     62.8          3 282833.
3 3         66.4     65.6     63.4          3 289532.
4 4         64.9     63.1     67.3          3 266232.
5 5         68.8     62.8     NA            2 297048.
6 6         63.4     65.2     61.5          3 261957.

CodePudding user response:

You have stumbled over the wonders of three valued logics, which introduces a third possible logical value, unknown (NA). It corresponds to NA values of variables through Codd's "Null Substitution Principle":

A logical expression has the value NA, if there is both a possible assignment to unknown variable values that evaluates as TRUE, and one that evaluates as FALSE.

From this principle, Codd derives the rule that the logical value NA is obtained in comparisons, whenever at least on of the operands has an NA value. This is implemented in R:

> x <- NA
> x < 3
[1] NA
> x == 3
[1] NA

As you cannot compare NA values with "==" there is the special function is.na() for testing NA:

> is.na(x)
[1] TRUE

Note that the Null Substitution Principle can also be generalized to arithmetic expressions that contain NA values:

> sum(1,2,NA,3)
[1] NA

The details of three valued logics have been introduced in section 2.3 of the following article:

E.F. Codd: "Extending the database relational model to capture more meaning." ACM Transactions on Database Systems 4.4, pp. 397-434, 1979

  •  Tags:  
  • r na
  • Related