I have a dataframe like below:
SampleId Col1 Col2 Col3 Col4
1 st1 k p
2 st2 k
3 st3 k p g
4 st4 k p g s
I want the empty rows in the columns to be filled based on the available values from preceding columns, so something like below:
SampleId Col1 Col2 Col3 Col4
1 st1 k p p p
2 st2 k k k k
3 st3 k p g g
4 st4 k p g s
What would be a dplyr
way to do this?
CodePudding user response:
You could transform the data to long, replace ""
with NA
, and fill in missing values with the previous value with fill()
. Finally, transform the data back to wide.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(starts_with('Col')) %>%
mutate(value = na_if(value, "")) %>%
fill(value) %>%
pivot_wider()
# # A tibble: 4 × 5
# SampleId Col1 Col2 Col3 Col4
# <chr> <chr> <chr> <chr> <chr>
# 1 st1 k p p p
# 2 st2 k k k k
# 3 st3 k p g g
# 4 st4 k p g s
Data
df <- structure(list(SampleId = c("st1", "st2", "st3", "st4"),
Col1 = c("k", "k", "k", "k"), Col2 = c("p", "", "p", "p"), Col3 = c("", "", "g", "g"),
Col4 = c("", "", "", "s")), class = "data.frame", row.names = c("1", "2", "3", "4"))