Home > Blockchain >  How to add na based on condition for a whole dataframe
How to add na based on condition for a whole dataframe

Time:11-03

I just want to know how to find and replace empty columns into na for a whole data frame

sample data

structure(list(id = structure(8.44425875736171e-318, class = "integer64"), 
    project_id = 11L, experiment_id = 85L, 
    gene = "", si = -0.381, pi = "" 
    on1 = "CC", 
    on2 = "GG", 
    on3 = "aa", 
    created_at = structure(1618862091.85075, class = c("POSIXct", 
    "POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x000001ba09da3590>)

i have a solution to check for a particular column but i dont how to apply this for whole dataframe

data$gene <- ifelse((is.na(data$gene) == TRUE),'NA',data$gene)

CodePudding user response:

You could use lapply with gsub to replace each empty cell with NA like this:

df <- structure(list(id = structure(8.44425875736171e-318, class = "integer64"), 
                     project_id = 11L, experiment_id = 85L, 
                     gene = "", si = -0.381, pi = "", 
                     on1 = "CC", 
                     on2 = "GG", 
                     on3 = "aa", 
                     created_at = structure(1618862091.85075, class = c("POSIXct", 
                                                                        "POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("data.table", 
                                                                                                                                      "data.frame"))

df
#>              id project_id experiment_id gene     si pi on1 on2 on3
#> 1 8.444259e-318         11            85      -0.381     CC  GG  aa
#>            created_at
#> 1 2021-04-19 19:54:51
df[] <- lapply(df, function(x) gsub("^$", NA, x))
df
#>                      id project_id experiment_id gene     si   pi on1 on2 on3
#> 1 8.44425875736171e-318         11            85 <NA> -0.381 <NA>  CC  GG  aa
#>            created_at
#> 1 2021-04-19 19:54:51

Created on 2022-11-02 with reprex v2.0.2

CodePudding user response:

You can also use dplyr with mutate and across

library(dplyr)
library(tidyr)

df <- structure(list(id = structure(8.44425875736171e-318, class = "integer64"), 
                     project_id = 11L, experiment_id = 85L, 
                     gene = "", si = -0.381, pi = "", 
                     on1 = "CC", 
                     on2 = "GG", 
                     on3 = "aa", 
                     created_at = structure(1618862091.85075, class = c("POSIXct", 
                                                                        "POSIXt"), tzone = "UTC")), row.names = c(NA, -1L), class = c("data.table", 
                                                                                                                                      "data.frame"))

df %>% 
  mutate(dplyr::across(where(is.character), ~ gsub("^$", NA, .x)))


Note that I also attempted to use replace_na, however this only works on values that are actually NA.

test %>% 
  mutate(dplyr::across(where(is.character), ~ replace_na(.x, "NA")))
  • "" is not considered
  • NA is considered NA

Keep that in mind while you are performing your analysis.

CodePudding user response:

Using na_if

library(data.table)
library(dplyr)
df[, lapply(.SD, \(x) if(is.character(x)) na_if(x, "") else x)]

-output

        id project_id experiment_id   gene     si     pi    on1    on2    on3          created_at
     <i64>      <int>         <int> <char>  <num> <char> <char> <char> <char>              <POSc>
1: 1709137         11            85   <NA> -0.381   <NA>     CC     GG     aa 2021-04-19 19:54:51
  •  Tags:  
  • r
  • Related