Home > Mobile >  Replacing only NA values in a column based on strings in others in R
Replacing only NA values in a column based on strings in others in R

Time:05-05

I'm trying to replace NA values in a column of a dataframe based on strings in R. My data is called flora and looks like this:

species mating system pollination cleistogamy
Blysmus compressus generally self entirely cleistogamous
Dactylis glomerata NA wind cleistogamy not recorded
Daucus carota generally cross NA NA
Agrostis curtisii NA wind cleistogamy not recorded
Hornungia petraea generally self insect cleistogamy not recorded

I want to replace NA values in the mating system column when the pollination column == "wind" and the cleistogamy column =="cleistogamy not recorded", without affecting existing values in the mating system column.

I've tried multiple approaches but keep coming up with errors. Applying the following code

is.na(flora$msystem) [is.na(flora$pollination)=="wind"& is.na(flora$cleistogamy)=="cleistogamy not recorded"]<- "generally cross"

I get an error saying the replacement has more rows than the data. Nothing seems to change in the data when I try either

flora%>%
  mutate(msystem = if_else(is.na(msystem) & 
                                is.na(pollination)=="wind" &
                                is.na(cleistogamy) == "cleistogamy not recorded", "generally cross", msystem))

or

flora %>% 
  mutate(msystem = case_when(is.na(msystem) & pollination=="wind" & cleistogamy == "cleistogamy not recorded" ~ "generally cross",
                                TRUE ~ msystem))

I'm quite stumped, any advice would be extremely helpful!

CodePudding user response:

Here is one option

library(dplyr)
flora %>% 
  mutate(msystem = case_when(is.na(msystem) & 
              pollination %in% "wind" &
              cleistogamy %in% "cleistogamy not recorded" ~ "generally cross",
                                TRUE ~ msystem))

-output

         species         msystem pollination              cleistogamy
1 Blysmus compressus  generally self        <NA>   entirely cleistogamous
2 Dactylis glomerata generally cross        wind cleistogamy not recorded
3      Daucus carota generally cross        <NA>                     <NA>
4  Agrostis curtisii generally cross        wind cleistogamy not recorded
5  Hornungia petraea  generally self      insect cleistogamy not recorded

data

flora <- structure(list(species = c("Blysmus compressus", "Dactylis glomerata", 
"Daucus carota", "Agrostis curtisii", "Hornungia petraea"), msystem = c("generally self", 
NA, "generally cross", NA, "generally self"), pollination = c(NA, 
"wind", NA, "wind", "insect"), cleistogamy = c("entirely cleistogamous", 
"cleistogamy not recorded", NA, "cleistogamy not recorded", "cleistogamy not recorded"
)), class = "data.frame", row.names = c(NA, -5L))

CodePudding user response:

In general, I think you do not understand what is.na(.) is meant to do. It returns whether the objects in its argument are NA or not. I does not filter or restrict assignment.

A few things:

  • Your base code fails because the left side of your assignment is empty:

    is.na(flora$msystem)[is.na(flora$pollination)=="wind"& is.na(flora$cleistogamy)=="cleistogamy not recorded"]
    # logical(0)
    

    We can find out why by breaking it down.

    is.na(flora$msystem)
    # [1] FALSE  TRUE FALSE  TRUE FALSE
    

    The above is fine.

    is.na(flora$pollination)=="wind"
    # [1] FALSE FALSE FALSE FALSE FALSE
    

    The above is a logical error. is.na(.) returns logical, which will never equal the string literal "wind". If you remove is.na, then you have some issues, namely

    flora$pollination == "wind"
    # [1] FALSE  TRUE    NA  TRUE FALSE
    

    One common way around this is to use %in% instead,

    flora$pollination %in% "wind"
    # [1] FALSE  TRUE FALSE  TRUE FALSE
    

    Similarly for your last part, is.na(flora$cleistogomay).

  • Further, the base code will not do what you want because the LHS replacement is is.na(flora$mystem), so it will not replace what you want it to. You need to replace the flora$msystem itself.

Perhaps:

flora$msystem <- ifelse(
  is.na(flora$msystem) &
    flora$cleistogamy %in% "cleistogamy not recorded" &
    flora$pollination %in% "wind",
  "generally cross", flora$msystem)
flora
#              species         msystem pollination              cleistogamy
# 1 Blysmus compressus  generally self               entirely cleistogamous
# 2 Dactylis glomerata generally cross        wind cleistogamy not recorded
# 3      Daucus carota generally cross        <NA>                     <NA>
# 4  Agrostis curtisii generally cross        wind cleistogamy not recorded
# 5  Hornungia petraea  generally self      insect cleistogamy not recorded

BTW, both of your dplyr code samples run correctly, but I wonder if you're not reassigning the result back to flora. For instance, contrast this

flora %>%
  mutate(msystem = if_else(is.na(msystem) & cleistogamy == "cleistogamy not recorded" & pollination == "wind", "generally cross", msystem))
#              species         msystem pollination              cleistogamy
# 1 Blysmus compressus  generally self        <NA>   entirely cleistogamous
# 2 Dactylis glomerata generally cross        wind cleistogamy not recorded
# 3      Daucus carota generally cross        <NA>                     <NA>
# 4  Agrostis curtisii generally cross        wind cleistogamy not recorded
# 5  Hornungia petraea  generally self      insect cleistogamy not recorded
flora
#              species         msystem pollination              cleistogamy
# 1 Blysmus compressus  generally self        <NA>   entirely cleistogamous
# 2 Dactylis glomerata            <NA>        wind cleistogamy not recorded
# 3      Daucus carota generally cross        <NA>                     <NA>
# 4  Agrostis curtisii            <NA>        wind cleistogamy not recorded
# 5  Hornungia petraea  generally self      insect cleistogamy not recorded

(unchanged flora contents) with

flora <- flora %>%
  mutate(msystem = if_else(is.na(msystem) & cleistogamy == "cleistogamy not recorded" & pollination == "wind", "generally cross", msystem))
flora
#              species         msystem pollination              cleistogamy
# 1 Blysmus compressus  generally self        <NA>   entirely cleistogamous
# 2 Dactylis glomerata generally cross        wind cleistogamy not recorded
# 3      Daucus carota generally cross        <NA>                     <NA>
# 4  Agrostis curtisii generally cross        wind cleistogamy not recorded
# 5  Hornungia petraea  generally self      insect cleistogamy not recorded
  • Related