Home > Software engineering >  replace NAs of a column with the single value of the same column
replace NAs of a column with the single value of the same column

Time:06-04

I'm still new on R and am struggling with something that might be very simple though. I have the following dataframe :

df = data.frame (trial_number  = c("41", "61", "141", "161"),
                 participant_id = c("sub-x", "sub-x","sub-x", "sub-x"),
                 B3 = c("1809.154","NA","NA", "NA"),
                 B4 = c("NA","1442.476","NA", "NA"),
                 B6 = c("NA","NA","1174.818", "NA"),
                 B7 = c("NA","NA","NA", "909.5714"))

I would like that for each column, NAs are replaced by the single value given in that column. I.e. I would like to obtain the following :

df = data.frame (trial_number  = c("41", "61", "141", "161"),
                 participant_id = c("sub-x", "sub-x","sub-x", "sub-x"),
                 B3 = c("1809.154","1809.154","1809.154", "1809.154"),
                 B4 = c("1442.476","1442.476","1442.476", "1442.476"),
                 B6 = c("1174.818","1174.818","1174.818", "1174.818"),
                 B7 = c("909.5714","909.5714","909.5714", "909.5714"))

How can I do that ?

Thanks a lot in advance for your response !

CodePudding user response:

This is the typical case for tidyr::fill().

library(tidyr)

fill(df, B3:B6, .direction = "updown")

CodePudding user response:

You can use a for loop:

for(k in 3:6) df[,k] <- df[,k][which(df[,k]!= "NA")]

  trial_number participant_id       B3       B4       B6       B7
1           41          sub-x 1809.154 1442.476 1174.818 909.5714
2           61          sub-x 1809.154 1442.476 1174.818 909.5714
3          141          sub-x 1809.154 1442.476 1174.818 909.5714
4          161          sub-x 1809.154 1442.476 1174.818 909.5714

What the function does:

  1. which(df[,k]!= "NA") finds the location of any value that is not equal to "NA" in the k-th column. For example, in the 5th column the value is located at the third row, so this line: which(df[,5]!= "NA") will return 3
  2. df[,k][which(df[,k]!= "NA")] returns the value that is not "NA".
  3. df[,k] <- assigns the value to all element of the k-th column.
  4. for(k in 3:6) assigns 3 to 6 to k, then applies the steps 1 to 3 above to the 3rd to the 6th columns in a loop.

CodePudding user response:

Here is a dyplr only approach using across with an i_felse statement:

library(dplyr)

df %>%   
  mutate(across(B3:B7, ~if_else(. == "NA", min(.), .)))
  trial_number participant_id       B3       B4       B6       B7
1           41          sub-x 1809.154 1442.476 1174.818 909.5714
2           61          sub-x 1809.154 1442.476 1174.818 909.5714
3          141          sub-x 1809.154 1442.476 1174.818 909.5714
4          161          sub-x 1809.154 1442.476 1174.818 909.5714
  • Related