I've tried to find the answer to this simple question without anyluck.
Let's say I have:
A <- c(1101,1102,1103)
and a dataframe that looks like this:
t m com x
1 1 1101 10
1 1 1102 15
1 1 1103 20
1 2 NA NA
1 3 1101 20
1 3 1102 25
1 3 1103 30
1 4 NA NA
1 5 1101 25
1 5 1102 30
1 5 1103 35
and since what I want to do is a linear interpolation of X with na.approx (actual dataframe and object are way bigger), I need the dataframe to look like this:
t m com x
1 1 1101 10
1 1 1102 15
1 1 1103 20
1 2 1101 NA
1 2 1102 NA
1 2 1103 NA
1 3 1101 20
1 3 1102 25
1 3 1103 30
1 4 1101 NA
1 4 1102 NA
1 4 1103 NA
1 5 1101 25
1 5 1102 30
1 5 1103 35
I haven't tried any code for this because I don't know where to start.
Any help and/or r material that you consider usefull would be great.
Thanks,
CodePudding user response:
The function you need is tidyr::complete()
.
- Since you would like to "expand" column
com
within group ofm
, you need togroup_by(m)
first, - then use the vector
A
tocomplete
thecom
column. - In this case, the
t
column will be filled withNA
by default, we canfill()
up columnt
using the value in the previous row (since you said you have a large dataset, this should work better than setting thefill
parameter incomplete()
). - Then drop any
NA
in thecom
column (these are the originalNA
in your data set). - Finally, reorder the columns back to their original position.
library(tidyverse)
A <- c(1101,1102,1103)
df %>%
group_by(m) %>%
complete(com = A) %>%
fill(t, .direction = "up") %>%
drop_na(com) %>%
select(t, m, com, x)
# A tibble: 15 x 4
# Groups: m [5]
t m com x
<int> <int> <dbl> <int>
1 1 1 1101 10
2 1 1 1102 15
3 1 1 1103 20
4 1 2 1101 NA
5 1 2 1102 NA
6 1 2 1103 NA
7 1 3 1101 20
8 1 3 1102 25
9 1 3 1103 30
10 1 4 1101 NA
11 1 4 1102 NA
12 1 4 1103 NA
13 1 5 1101 25
14 1 5 1102 30
15 1 5 1103 35