Home > other >  I have no idea to express my problem in converting a dataframe to another, please read below to more
I have no idea to express my problem in converting a dataframe to another, please read below to more

Time:12-31

I want to convert a dataframe (left) to another (right), but I have no idea to do this task. Briefly, the nearest value in A was retained and merged with B. Please take your time to help me! Thank you so much!

The original dataframe and converted dataframe (highlighted background)

Here is the original data.

df1 = data.frame(
  ID = c(1,1,1,1,1,1,1,1,2,2,2,2,2),
  TIME = c(-2, -1,  0, 11, 13, 24, 28, 36, -3,  0, 12, 13, 24),
  A = c(30, 50, NA, 80, NA, NA, 30, NA, 20, NA, NA, 80, NA),
  B = c(NA,  NA, 2.0,  NA, 2.5, 2.5,  NA, 1.0,  NA, 1.0, 1.0,  NA, 2.5)
)

CodePudding user response:


library(tidyverse)

df <- df1 %>% 
  fill(A, .direction = c("down")) %>% # fill missing data in A
  drop_na() %>%
  rename(B = A, A = B) %>% # swap the names of columns A and B
  select(ID, TIME, A, B)  # select the order of columns

CodePudding user response:

Here is a solution. It keeps the rows with non-missing values in B after filling the values in A with the previous value. Then it's a matter of rearranging and renaming the columns.

library(dplyr)
library(tidyr)

df1 %>%
  group_by(ID) %>%
  fill(A) %>%
  ungroup() %>%
  na.omit() %>%
  rename(B = A, A = B) %>%
  relocate(B, .after = A)
## A tibble: 7 x 4
#     ID  TIME     A     B
#  <dbl> <dbl> <dbl> <dbl>
#1     1     0   2      50
#2     1    13   2.5    80
#3     1    24   2.5    80
#4     1    36   1      30
#5     2     0   1      20
#6     2    12   1      20
#7     2    24   2.5    80

Edit

TarJae's comment makes the code above shorter:

df1 %>%
  fill(A) %>% 
  select(ID, TIME, A=B, B=A) %>%
  drop_na()

CodePudding user response:

Here is a base R method how we could do it:

# swap A and B
df1[ , c(3,4)]  <- df1[ , c(4,3)]
# fill na with prvious value from above
df1$B <- na.omit(df1$B)[cumsum(!is.na(df1$B))]
# remove NA
df1[complete.cases(df1$A),]
   ID TIME   A  B
3   1    0 2.0 50
5   1   13 2.5 80
6   1   24 2.5 80
8   1   36 1.0 30
10  2    0 1.0 20
11  2   12 1.0 20
13  2   24 2.5 80
  • Related