Home > database >  Replace missing value of all the following columns based on the value of preceding column
Replace missing value of all the following columns based on the value of preceding column

Time:08-06

I have a dataframe like below:

  SampleId  Col1  Col2  Col3  Col4
1  st1       k     p     
2  st2       k
3  st3       k     p     g
4  st4       k     p     g      s

I want the empty rows in the columns to be filled based on the available values from preceding columns, so something like below:

  SampleId  Col1  Col2  Col3  Col4
1  st1       k     p     p     p
2  st2       k     k     k     k
3  st3       k     p     g     g
4  st4       k     p     g     s

What would be a dplyr way to do this?

CodePudding user response:

You could transform the data to long, replace "" with NA, and fill in missing values with the previous value with fill(). Finally, transform the data back to wide.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(starts_with('Col')) %>%
  mutate(value = na_if(value, "")) %>%
  fill(value) %>%
  pivot_wider()

# # A tibble: 4 × 5
#   SampleId Col1  Col2  Col3  Col4
#   <chr>    <chr> <chr> <chr> <chr>
# 1 st1      k     p     p     p
# 2 st2      k     k     k     k
# 3 st3      k     p     g     g
# 4 st4      k     p     g     s

Data
df <- structure(list(SampleId = c("st1", "st2", "st3", "st4"),
Col1 = c("k", "k", "k", "k"), Col2 = c("p", "", "p", "p"), Col3 = c("", "", "g", "g"),
Col4 = c("", "", "", "s")), class = "data.frame", row.names = c("1", "2", "3", "4"))
  • Related