Home > front end >  Replacing NA with 0 in columns that contain a substring in the column name
Replacing NA with 0 in columns that contain a substring in the column name

Time:01-27

Let's say I had a data frame and I wanted to replace NA's with 0's only in columns that contained a specific string in the column name.

Let's call the dataframe df, the columns are numeric type, and the string "keyword".

i.e., a column name would be, "Column1keyword", etc.

How can I do this?

I've tried this and it didn't work:

 df %>%
    mutate(across(where(~ colnames.is.numeric(.x) && 'keyword' %in% colnames.x), replace_na, 0))
 

CodePudding user response:

Update

If it is to select columns having 'keyword' as substring in the column names, use contains to select across those columns

library(dplyr)
library(tidyr)
df1 <- df1 %>%
         mutate(across(contains('keyword'), replace_na, 0))

-output

df1
# A tibble: 5 × 4
   col1 col2_keyword col3   col4
  <int> <chr>        <chr> <dbl>
1     1 a            a         1
2     2 b            b         3
3     3 0            c        NA
4     4 c            d         5
5     5 d            <NA>      6

Assuming that the OP mentioned to replace NA only in columns that have a specific element 'keyword', use where with a logical expression to select the columns that have the 'keyword', loop across those columns and use replace_na to replace the NA to 0

df <- df %>%
    mutate(across(where(~ is.character(.x) && 'keyword' %in% .x), replace_na, 0))

-output

df
# A tibble: 5 × 4
   col1 col2    col3   col4
  <int> <chr>   <chr> <dbl>
1     1 a       a         1
2     2 b       b         3
3     3 keyword c        NA
4     4 0       d         5
5     5 c       <NA>      6

data

df <- tibble(col1 = 1:5, col2 = c("a", "b", "keyword", NA, 'c'), 
 col3 = c('a', 'b', 'c', 'd', NA), col4 = c(1, 3, NA, 5, 6))
df1 <- tibble(col1 = 1:5, col2_keyword = c("a", "b", NA, 'c', 'd'),
    col3 =c('a', 'b', 'c', 'd', NA), col4 = c(1, 3, NA, 5, 6))
  •  Tags:  
  • Related