I have the next dataframe:
House = rep(c('A','B','C'), each=10)
Date = as.Date(c('2014-10-13','2014-10-20','2014-10-27','2014-11-03','2014-11-10','2014-10-30','2014-11-06','2014-11-13','2014-11-20','2014-11-27',
'2019-11-27','2019-12-04','2019-12-11','2019-12-18','2019-12-25','2020-01-01','2020-01-08','2020-01-15','2020-01-22','2020-01-29',
'2017-07-13','2017-07-20','2017-04-21','2017-04-28','2017-05-05','2017-05-12','2017-05-19','2017-05-26','2017-06-02','2017-06-09'))
Week = rep(c(1,2,3,4,5,6,7,8,9,10), times=3)
Value = c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1,0.001,0.002,0.003,0.004,0.005,0.006,0.007,0.008,0.009,0.01)
mock = data.frame(House,Date,Week,Value)
I want to split this dataframe by Year
in specific, BUT, I want to keep together the group B
. Otherwise it gets split as follows:
$`2019`
A data.frame: 5 × 4 House Date Week Value
<chr> <date> <dbl> <dbl>
11 B 2019-11-27 1 0.01
12 B 2019-12-04 2 0.02
13 B 2019-12-11 3 0.03
14 B 2019-12-18 4 0.04
15 B 2019-12-25 5 0.05
$`2020`
A data.frame: 5 × 4 House Date Week Value
<chr> <date> <dbl> <dbl>
16 B 2020-01-01 6 0.06
17 B 2020-01-08 7 0.07
18 B 2020-01-15 8 0.08
19 B 2020-01-22 9 0.09
20 B 2020-01-29 10 0.10
Desired output:
$`2019-2020`
A data.frame: 5 × 4 House Date Week Value
<chr> <date> <dbl> <dbl>
11 B 2019-11-27 1 0.01
12 B 2019-12-04 2 0.02
13 B 2019-12-11 3 0.03
14 B 2019-12-18 4 0.04
15 B 2019-12-25 5 0.05
16 B 2020-01-01 6 0.06
17 B 2020-01-08 7 0.07
18 B 2020-01-15 8 0.08
19 B 2020-01-22 9 0.09
20 B 2020-01-29 10 0.10
I tried a similar approach using dplyr::group_split()
, but obtaining a similar output. On the other hand, the cleanest list with split()
is given by the next, but I DON'T need it by House
:
house.year = split(mock, c(House, format(mock$Date, "%Y")))
house.year = house.year[sapply(house.year, function(x) dim(x)[1]) > 0]
NOTE: The name of the desired list is how I imagine it, not how I expect it, but I would like that content included in this list of lists.
EDIT: edited the title to point that I have more than 3 categories, like asked in the beginning.
CodePudding user response:
house.year <- split(mock, list(mock$House, format(mock$Date, "%Y")), drop = TRUE)
lst1 <- lapply(split(house.year, sub("\\.\\d $", "",
names(house.year))),
\(x) {tmp <- do.call(rbind, x)
row.names(tmp) <- NULL; tmp})
names(lst1) <- sapply(lst1, \(x) paste(unique(format(x$Date, "%Y")),
collapse = "-"))
-output
> lst1
$`2014`
House Date Week Value
1 A 2014-10-13 1 0.1
2 A 2014-10-20 2 0.2
3 A 2014-10-27 3 0.3
4 A 2014-11-03 4 0.4
5 A 2014-11-10 5 0.5
6 A 2014-10-30 6 0.6
7 A 2014-11-06 7 0.7
8 A 2014-11-13 8 0.8
9 A 2014-11-20 9 0.9
10 A 2014-11-27 10 1.0
$`2019-2020`
House Date Week Value
1 B 2019-11-27 1 0.01
2 B 2019-12-04 2 0.02
3 B 2019-12-11 3 0.03
4 B 2019-12-18 4 0.04
5 B 2019-12-25 5 0.05
6 B 2020-01-01 6 0.06
7 B 2020-01-08 7 0.07
8 B 2020-01-15 8 0.08
9 B 2020-01-22 9 0.09
10 B 2020-01-29 10 0.10
$`2017`
House Date Week Value
1 C 2017-07-13 1 0.001
2 C 2017-07-20 2 0.002
3 C 2017-04-21 3 0.003
4 C 2017-04-28 4 0.004
5 C 2017-05-05 5 0.005
6 C 2017-05-12 6 0.006
7 C 2017-05-19 7 0.007
8 C 2017-05-26 8 0.008
9 C 2017-06-02 9 0.009
10 C 2017-06-09 10 0.010
CodePudding user response:
Probably you can try code below
mock %>%
group_by(House) %>%
mutate(year = paste0(unique(format(Date, "%Y")), collapse = "-")) %>%
ungroup() %>%
split(~year) %>%
lapply(`[`, -(length(mock) 1))
which gives
$`2014`
# A tibble: 10 × 4
House Date Week Value
<chr> <date> <dbl> <dbl>
1 A 2014-10-13 1 0.1
2 A 2014-10-20 2 0.2
3 A 2014-10-27 3 0.3
4 A 2014-11-03 4 0.4
5 A 2014-11-10 5 0.5
6 A 2014-10-30 6 0.6
7 A 2014-11-06 7 0.7
8 A 2014-11-13 8 0.8
9 A 2014-11-20 9 0.9
10 A 2014-11-27 10 1
$`2017`
# A tibble: 10 × 4
House Date Week Value
<chr> <date> <dbl> <dbl>
1 C 2017-07-13 1 0.001
2 C 2017-07-20 2 0.002
3 C 2017-04-21 3 0.003
4 C 2017-04-28 4 0.004
5 C 2017-05-05 5 0.005
6 C 2017-05-12 6 0.006
7 C 2017-05-19 7 0.007
8 C 2017-05-26 8 0.008
9 C 2017-06-02 9 0.009
10 C 2017-06-09 10 0.01
$`2019-2020`
# A tibble: 10 × 4
House Date Week Value
<chr> <date> <dbl> <dbl>
1 B 2019-11-27 1 0.01
2 B 2019-12-04 2 0.02
3 B 2019-12-11 3 0.03
4 B 2019-12-18 4 0.04
5 B 2019-12-25 5 0.05
6 B 2020-01-01 6 0.06
7 B 2020-01-08 7 0.07
8 B 2020-01-15 8 0.08
9 B 2020-01-22 9 0.09
10 B 2020-01-29 10 0.1
CodePudding user response:
My real dataset is a bit more complex with more House
s in it. Then after akrun solution I would get a list with repeated Year
s as name, like here for 2017
:
$`2014`
House Date Week Value
1 A 2014-10-13 1 0.1
2 A 2014-10-20 2 0.2
...
$`2019-2020`
House Date Week Value
1 B 2019-11-27 1 0.01
2 B 2019-12-04 2 0.02
...
$`2017`
House Date Week Value
1 C 2017-07-13 1 0.001
2 C 2017-07-20 2 0.002
...
$`2017`
House Date Week Value
1 D 2017-07-13 1 0.001
2 D 2017-07-20 2 0.002
...
akrun's solution can be improved by adding another lapply
, like in this solution by Mohamed, to rbind
all the rows under the same Year
, which is what I wanted in the end for subsequent analyses:
lst1 = lapply(split(lst1, names(lst1)) , \(x) do.call(rbind , x))
output:
$`2014`
House Date Week Value
1 A 2014-10-13 1 0.1
2 A 2014-10-20 2 0.2
...
$`2019-2020`
House Date Week Value
1 B 2019-11-27 1 0.01
2 B 2019-12-04 2 0.02
...
$`2017`
House Date Week Value
1 C 2017-07-13 1 0.001
2 C 2017-07-20 2 0.002
1 D 2017-07-13 1 0.001
2 D 2017-07-20 2 0.002
...