I have a dataset which has a date column and a data column.
The numbers of the rows of each date may not be the same, some of them may only have 10 rows instead of 24 rows. The dataset like this:
Date | hour | value |
---|---|---|
10-06-2000 | 1 | 4 |
2 | 5 | |
3 | 7 | |
4 | 7 | |
5 | 8 | |
6 | 1 | |
7 | 7 | |
8 | 2 | |
9 | 3 | |
10 | 4 | |
11 | 5 | |
12 | 7 | |
13 | 8 | |
14 | 9 | |
15 | 10 | |
16 | 12 | |
17 | 1 | |
18 | 4 | |
19 | 7 | |
20 | 9 | |
21 | 10 | |
22 | 7 | |
23 | 8 | |
24 | 9 | |
11-06-2000 | 9 | 1 |
10 | 4 | |
11 | 5 | |
12 | 7 | |
13 | 8 | |
14 | 9 | |
15 | 10 | |
16 | 12 | |
17 | 1 | |
18 | 4 | |
19 | 7 | |
20 | 9 | |
21 | 10 | |
22 | 7 | |
23 | 8 | |
24 | 9 |
I want to split the dataset into multiple data frames by date. However, in the date variable, the elements between the two dates are empty. When I tried to use split function in base r, the function only returned the first row of each date:
$`2000-06-11`
V1 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
264 2000-06-11 2 7 8 3 2 3 4 7 4 5 8
$`2000-06-12`
V1 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
278 2000-06-12 2 2 9 6 2 3 1 0 1 4 4
Sorry for asking such simple question, I try to use for loop to handle this problem but the dataset is too large that the running speed is very slow.
CodePudding user response:
If you know for certain that the data are sorted in the right order, you could use tidyr::fill
:
library(tidyr)
df <- data.frame(
Date = c("10-06-2000", rep(NA, 5), "11-06-2000", rep(NA, 12)),
hour = c(4:9, 1:13),
value = 1:19
)
df_filled <- fill(df, Date, .direction = "down")
split(df_filled, df_filled$Date)
$`10-06-2000`
Date hour value
1 10-06-2000 4 1
2 10-06-2000 5 2
3 10-06-2000 6 3
4 10-06-2000 7 4
5 10-06-2000 8 5
6 10-06-2000 9 6
$`11-06-2000`
Date hour value
7 11-06-2000 1 7
8 11-06-2000 2 8
9 11-06-2000 3 9
10 11-06-2000 4 10
11 11-06-2000 5 11
12 11-06-2000 6 12
13 11-06-2000 7 13
14 11-06-2000 8 14
15 11-06-2000 9 15
16 11-06-2000 10 16
17 11-06-2000 11 17
18 11-06-2000 12 18
19 11-06-2000 13 19
CodePudding user response:
You could also use group_split()
in combination with fill()
:
library(tidyr)
library(dplyr)
df <- data.frame(
Date = c("10-06-2000", rep(NA, 5), "11-06-2000", rep(NA, 12)),
hour = c(4:9, 1:13),
value = 1:19
)
df_filled <- df |>
fill(Date, .direction = "down") |>
group_split(Date) |>
purrr::set_names(unique(df$Date)[!is.na(unique(df$Date))])