I am stuck in a seemingly easy task. Imagine the following data.table
:
dt1 <- data.table(ID = as.factor(c("202E", "202E", "202E")),
timestamp = as.POSIXct(c("2017-05-02 00:00:00",
"2017-05-02 00:15:00",
"2017-05-02 00:30:00")),
acceleration_raw = c("-0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.727 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.727 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.141 -0.703 0.656 0.164 -0.703 0.656 0.141 -0.703 0.656 0.141 -0.703 0.656 0.141 -0.703 0.656 0.141",
"-0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117",
"-0.750 0.586 0.117 -0.773 0.586 0.117 -0.773 0.609 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.750 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117"))
Created on 2022-11-17 with reprex v2.0.2
The idea is that I want to separate the acceleration_raw
column into 3 different ones: acc_x
, acc_y
and acc_z
. Each row of acceleration_raw
is a string of characters, eventually leading to 120 numeric observations. I want to separate acceleration_raw
and then take every the value from the first row and forth with a step of 3 and put it to acc_x
, every value from the second row and forth and put it to acc_y
, and finally every value from the third row and on and put it to acc_z
.
I tried to first separate acceleration_raw
with separate_rows
from dplyr
:
library('tidyverse')
library('data.table')
dt1 <- dt1 %>%
separate_rows(acceleration_raw, sep = " ", convert = F)
Created on 2022-11-17 with reprex v2.0.2
And after that:
library('tidyverse')
library('data.table')
dt1 <- dt1 %>%
separate_rows(acceleration_raw, sep = " ", convert = F) %>%
mutate(acc_x = seq(acceleration_raw, from = 1, to = length(dt1), by = 3),
acc_y = seq(acceleration_raw, from = 2, to = length(dt1), by = 3),
acc_z = seq(acceleration_raw, from = 3, to = length(dt1), by = 3))
#> Warning in seq.default(acceleration_raw, from = 1, to = length(dt1), by = 3):
#> first element used of 'length.out' argument
#> Error in `mutate()`:
#> ! Problem while computing `acc_x = seq(acceleration_raw, from = 1, to =
#> length(dt1), by = 3)`.
#> Caused by error in `ceiling()`:
#> ! non-numeric argument to mathematical function
Created on 2022-11-17 with reprex v2.0.2
Any suggestions on how to proceed?
CodePudding user response:
You could use pivot_wider
and unnest
:
library(tidyverse)
dt1 %>%
separate_rows(acceleration_raw, sep = " ", convert = F) %>%
mutate(id = rep(c("acc_x", "acc_y", "acc_z"), times = nrow(.) / 3)) %>%
pivot_wider(names_from = id, values_from = acceleration_raw, values_fn = list) %>%
unnest(cols = c("acc_x", "acc_y", "acc_z"))
This returns
# A tibble: 120 × 5
ID timestamp acc_x acc_y acc_z
<fct> <dttm> <chr> <chr> <chr>
1 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
2 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
3 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
4 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
5 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
6 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
7 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
8 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
9 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
10 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
# … with 110 more rows
CodePudding user response:
I was not able pivot_wider without NA's, so the solution is not optimal:
library(tidyverse)
dt1 %>%
as_tibble() %>%
separate_rows(acceleration_raw, sep = " ") %>%
group_by(group = as.integer(gl(n(), n()/3, n()))) %>%
mutate(id = row_number()) %>%
mutate(group = case_when(group == 1 ~ "acc_x",
group == 2 ~ "acc_y",
group == 3 ~ "acc_z")) %>%
pivot_wider(names_from = group, values_from = acceleration_raw) %>%
mutate(acc_y = lead(acc_y,n()/3),
acc_z = lead(acc_z,n()/3*2)) %>%
na.omit()
# A tibble: 120 x 6
ID timestamp id acc_x acc_y acc_z
<fct> <dttm> <int> <chr> <chr> <chr>
1 202E 2017-05-02 00:00:00 1 -0.703 -0.703 -0.750
2 202E 2017-05-02 00:00:00 2 0.656 0.680 0.586
3 202E 2017-05-02 00:00:00 3 0.164 0.117 0.117
4 202E 2017-05-02 00:00:00 4 -0.703 -0.680 -0.773
5 202E 2017-05-02 00:00:00 5 0.656 0.680 0.586
6 202E 2017-05-02 00:00:00 6 0.164 0.117 0.117
7 202E 2017-05-02 00:00:00 7 -0.703 -0.680 -0.773
8 202E 2017-05-02 00:00:00 8 0.656 0.680 0.609
9 202E 2017-05-02 00:00:00 9 0.164 0.117 0.117
10 202E 2017-05-02 00:00:00 10 -0.703 -0.680 -0.773
# ... with 110 more rows
# i Use `print(n = ...)` to see more rows