Home > Software engineering >  Fill column within certain ranges with available value
Fill column within certain ranges with available value

Time:01-27

I have the following dataframe df (dput below):

> df
   group seq id
1      A  -5 NA
2      A  -4 NA
3      A  -3 NA
4      A  -2  1
5      A  -1  1
6      A   0 NA
7      A   1 NA
8      A   2 NA
9      A   3 NA
10     A   4 NA
11     A   5 NA
12     A  -5 NA
13     A  -4 NA
14     A  -3 NA
15     A  -2 NA
16     A  -1 NA
17     A   0 NA
18     A   1  5
19     A   2 NA
20     A   3 NA
21     A   4 NA
22     A   5 NA
23     B  -5 NA
24     B  -4  8
25     B  -3  8
26     B  -2  8
27     B  -1 NA
28     B   0 NA
29     B   1 NA
30     B   2 NA
31     B   3 NA
32     B   4 NA
33     B   5 NA
34     B  -5 NA
35     B  -4 NA
36     B  -3 NA
37     B  -2 NA
38     B  -1 NA
39     B   0 NA
40     B   1 NA
41     B   2 NA
42     B   3  4
43     B   4 NA
44     B   5 NA

I would like to fill the id column with the existing values with the range of the seq column. The range in this case is between -5 and 5. So for the first sequence the NAs should be filled with 1 between -5 and 5 of seq and the next one with 5. Here is the desired output:

   group seq id
1      A  -5  1
2      A  -4  1
3      A  -3  1
4      A  -2  1
5      A  -1  1
6      A   0  1
7      A   1  1
8      A   2  1
9      A   3  1
10     A   4  1
11     A   5  1
12     A  -5  5
13     A  -4  5
14     A  -3  5
15     A  -2  5
16     A  -1  5
17     A   0  5
18     A   1  5
19     A   2  5
20     A   3  5
21     A   4  5
22     A   5  5
23     B  -5  8
24     B  -4  8
25     B  -3  8
26     B  -2  8
27     B  -1  8
28     B   0  8
29     B   1  8
30     B   2  8
31     B   3  8
32     B   4  8
33     B   5  8
34     B  -5  4
35     B  -4  4
36     B  -3  4
37     B  -2  4
38     B  -1  4
39     B   0  4
40     B   1  4
41     B   2  4
42     B   3  4
43     B   4  4
44     B   5  4

As you see the first is filled with 1 and the second with 5 until the seq ranges. So I was wondering if anyone knows how to complete the ids with the available value with the sequence range?


dput df:

df<-structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B"), seq = c(-5, 
-4, -3, -2, -1, 0, 1, 2, 3, 4, 5, -5, -4, -3, -2, -1, 0, 1, 2, 
3, 4, 5, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, -5, -4, -3, -2, 
-1, 0, 1, 2, 3, 4, 5), id = c(NA, NA, NA, 1, 1, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, 5, NA, NA, NA, NA, NA, 8, 8, 
8, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
4, NA, NA)), class = "data.frame", row.names = c(NA, -44L))

CodePudding user response:

library(dplyr)
#select unique group  id entries in df
df2_left <- df[complete.cases(df),-2]%>%unique
#unique combination for group and seq
df2_right <- df %>%
  select(group,seq) %>%
  unique
merge(df2_left,df2_right,by="group")

   group id seq
1      A  1  -5
2      A  1  -4
3      A  1  -3
4      A  1  -2
5      A  1  -1
6      A  1   0
7      A  1   1
8      A  1   2
9      A  1   3
10     A  1   4
11     A  1   5
12     A  5  -5
13     A  5  -4
14     A  5  -3
15     A  5  -2
16     A  5  -1
17     A  5   0
18     A  5   1
19     A  5   2
20     A  5   3
21     A  5   4
22     A  5   5
23     B  8  -5
24     B  8  -4
25     B  8  -3
26     B  8  -2
27     B  8  -1
28     B  8   0
29     B  8   1
30     B  8   2
31     B  8   3
32     B  8   4
33     B  8   5
34     B  4  -5
35     B  4  -4
36     B  4  -3
37     B  4  -2
38     B  4  -1
39     B  4   0
40     B  4   1
41     B  4   2
42     B  4   3
43     B  4   4
44     B  4   5
  • Related