I am trying to transpose data in R. The data retrieved was a JSON file and here is an example of the dataframe I am using:
date parameter value
2020-07-01T23:50:00Z wind_dir 236.0
2020-07-01T23:40:00Z wind_dir 236.0
2020-07-01T23:40:00Z wind_speed 1.9
2020-07-01T23:30:00Z wind_dir 239.0
2020-07-01T23:10:00Z wind_dir 184.0
2020-07-01T23:00:00Z wind_dir 178.0
2020-07-01T22:50:00Z wind_speed 1.1
2020-07-01T22:50:00Z wind_dir 197.0
2020-07-01T22:40:00Z wind_speed 1.8
2020-07-01T22:30:00Z wind_speed 1.4
2020-07-01T22:20:00Z wind_dir 172.0
2020-07-01T22:20:00Z wind_speed 1.4
2020-07-01T22:00:00Z wind_dir 170.0
- I need to change the date, so it does not include T and Z.
- I want to transpose the rows and separate the parameters into two columns: wind_speed and wind_dir
The final dataset should look like:
date wind_dir wind_speed
2020-07-01 23:50:00 236.0 NA
2020-07-01 23:40:00 236.0 1.9
2020-07-01 23:30:00 239.0 NA
2020-07-01 23:10:00 184.0 NA
2020-07-01 23:00:00 178.0 NA
2020-07-01 22:50:00 197.0 1.1
2020-07-01 22:40:00 NA 1.8
2020-07-01 22:30:00 NA 1.4
2020-07-01 22:20:00 172.0 1.4
2020-07-01 22:00:00 170.0 NA
- I also would like to have the date starting with an increasing time stamp.
I appreciate your help!
CodePudding user response:
There are functions called pivot_wider
and pivot_longer
from the tidyr
package you can use. T
and Z
are part of ISO 8601 so you can easily transform them into a object of class datetime:
library(tidyverse)
data <- tribble(
~date, ~parameter, ~value,
"2020-07-01T23:50:00Z ", "wind_dir", 236.0,
"2020-07-01T23:40:00Z " , "wind_dir", 236.0,
"2020-07-01T23:40:00Z" , "wind_speed", 1.9
)
data
#> # A tibble: 3 x 3
#> date parameter value
#> <chr> <chr> <dbl>
#> 1 "2020-07-01T23:50:00Z " wind_dir 236
#> 2 "2020-07-01T23:40:00Z " wind_dir 236
#> 3 "2020-07-01T23:40:00Z" wind_speed 1.9
data %>%
mutate(date = date %>% parse_datetime()) %>%
pivot_wider(names_from = parameter, values_from = value)
#> # A tibble: 2 x 3
#> date wind_dir wind_speed
#> <dttm> <dbl> <dbl>
#> 1 2020-07-01 23:50:00 236 NA
#> 2 2020-07-01 23:40:00 236 1.9
Created on 2021-10-18 by the reprex package (v2.0.1)
CodePudding user response:
The T
and Z
removals can be done using base R gsub()
. The rest can be done using pivot_wider()
from the tidyr package:
raw <- read.table(text = "date parameter value
2020-07-01T23:50:00Z wind_dir 236.0
2020-07-01T23:40:00Z wind_dir 236.0
2020-07-01T23:40:00Z wind_speed 1.9
2020-07-01T23:30:00Z wind_dir 239.0
2020-07-01T23:10:00Z wind_dir 184.0
2020-07-01T23:00:00Z wind_dir 178.0
2020-07-01T22:50:00Z wind_speed 1.1
2020-07-01T22:50:00Z wind_dir 197.0
2020-07-01T22:40:00Z wind_speed 1.8
2020-07-01T22:30:00Z wind_speed 1.4
2020-07-01T22:20:00Z wind_dir 172.0
2020-07-01T22:20:00Z wind_speed 1.4
2020-07-01T22:00:00Z wind_dir 170.0", header = TRUE)
raw$date <- trimws(gsub("[TZ]", " ", raw$date))
library(tidyr)
packageVersion("tidyr")
#> [1] '1.1.2'
raw <- pivot_wider(raw,
names_from = "parameter",
values_from = "value")
raw
#> # A tibble: 10 x 3
#> date wind_dir wind_speed
#> <chr> <dbl> <dbl>
#> 1 2020-07-01 23:50:00 236 NA
#> 2 2020-07-01 23:40:00 236 1.9
#> 3 2020-07-01 23:30:00 239 NA
#> 4 2020-07-01 23:10:00 184 NA
#> 5 2020-07-01 23:00:00 178 NA
#> 6 2020-07-01 22:50:00 197 1.1
#> 7 2020-07-01 22:40:00 NA 1.8
#> 8 2020-07-01 22:30:00 NA 1.4
#> 9 2020-07-01 22:20:00 172 1.4
#> 10 2020-07-01 22:00:00 170 NA
Created on 2021-10-18 by the reprex package (v2.0.0)
CodePudding user response:
You may try
library(dplyr)
df %>%
mutate(date =gsub("T|Z", " ", date)) %>%
pivot_wider(names_from = parameter, values_from = value)
date wind_dir wind_speed
<chr> <dbl> <dbl>
1 "2020-07-01 23:50:00 " 236 NA
2 "2020-07-01 23:40:00 " 236 1.9
3 "2020-07-01 23:30:00 " 239 NA
4 "2020-07-01 23:10:00 " 184 NA
5 "2020-07-01 23:00:00 " 178 NA
6 "2020-07-01 22:50:00 " 197 1.1
7 "2020-07-01 22:40:00 " NA 1.8
8 "2020-07-01 22:30:00 " NA 1.4
9 "2020-07-01 22:20:00 " 172 1.4
10 "2020-07-01 22:00:00 " 170 NA