I have to apologize in advance if the question is very basic as I am still new to R. I have tried to look on stackoverflow for similar questions, but I still can't resolve the problem that I am facing.
I am currently working on a large dataset X. What I am trying to do is pretty simple. I want to replace all NAs in selected columns (non consecutive columns) with "no".
I firstly have created a variable including all the columns that I want to modify. For instance, if I want to modify the NAs in columns named "m","l" and "h", I wrote the following:
modify <- c("m","l","h")
for (i in 1:length(modify))
column <- modify[i]
X$column <- as.character(X$column) #X is my dataframe
X$column %>% replace_na("no")
This loop returned the output only for the "m" column, which is the first variable in my modify variable. However, even after generating the output after the loop, when I tried to check X$m, nothing has changed in my original dataset.
I also tried to create a function, which is very similar to the loop. Even though no error message was generated, it didn't work as I do not know what the return value should be.
Why can't the loop being applied to my entire dataset while the individual steps in the loop work?
Thank you so so much for your help!
CodePudding user response:
This might help, and was among one of the answers here (but slightly different here using all_of()
:
library(tidyverse)
df <- tibble(x = c(1, 2, NA), y = c("a", NA, "b"))
df
#> # A tibble: 3 × 2
#> x y
#> <dbl> <chr>
#> 1 1 a
#> 2 2 <NA>
#> 3 NA b
modify <- c("x","y")
df %>%
mutate(
across(all_of(modify), ~replace_na(.x, 0))
)
#> # A tibble: 3 × 2
#> x y
#> <dbl> <chr>
#> 1 1 a
#> 2 2 0
#> 3 0 b
Created on 2021-09-22 by the reprex package (v2.0.1)
CodePudding user response:
Here's a base R approach modifying data from @scrameri.
df <- data.frame(x = c(1, 2, NA), y = c("a", NA, "b"), c = c(1, NA, 5))
modify <- c('x', 'y')
df[modify][is.na(df[modify])] <- 'no'
df
# x y c
#1 1 a 1
#2 2 no NA
#3 no b 5
CodePudding user response:
I'm going to fix your code with as few changes as possible, so you can learn.
There are two big problems. First, the for
loop needs to have curly braces {}
around the lines you want to loop over. Second, if you want to reference variables in a data frame dynamically, you can't use the $
operator. You have to use double brackets [[]]
.
library(tidyr)
X <- data.frame(m = c(1, 2, NA), l = c("a", NA, "b"), h = c(1, NA, 5))
modify <- c("m","l","h")
for (i in seq_along(modify)) {
column <- modify[i]
X[[column]] <- as.character(X[[column]]) #X is my dataframe
X[[column]] <- X[[column]] %>% replace_na("no")
}
X
# m l h
# 1 1 a 1
# 2 2 no no
# 3 no b 5
You can do what you were trying to do much more efficiently, as shown in the other answers. But I wanted to show you how to do it the way you were trying to correct your understanding of for loops and the subset operator. These are basic things that everyone should understand when you are first learning R.
You might want to go through a beginners tutorial to solidify your understanding. I used tutorialspoint when I was first learning and found it useful.