Home > OS >  How to transpose or pivot columns of a date frame in R
How to transpose or pivot columns of a date frame in R

Time:10-19

I am trying to transpose data in R. The data retrieved was a JSON file and here is an example of the dataframe I am using:

date                    parameter   value
2020-07-01T23:50:00Z    wind_dir    236.0
2020-07-01T23:40:00Z    wind_dir    236.0
2020-07-01T23:40:00Z    wind_speed  1.9
2020-07-01T23:30:00Z    wind_dir    239.0
2020-07-01T23:10:00Z    wind_dir    184.0
2020-07-01T23:00:00Z    wind_dir    178.0
2020-07-01T22:50:00Z    wind_speed  1.1
2020-07-01T22:50:00Z    wind_dir    197.0
2020-07-01T22:40:00Z    wind_speed  1.8
2020-07-01T22:30:00Z    wind_speed  1.4
2020-07-01T22:20:00Z    wind_dir    172.0
2020-07-01T22:20:00Z    wind_speed  1.4
2020-07-01T22:00:00Z    wind_dir    170.0 
  1. I need to change the date, so it does not include T and Z.
  2. I want to transpose the rows and separate the parameters into two columns: wind_speed and wind_dir

The final dataset should look like:

date                 wind_dir wind_speed         
2020-07-01 23:50:00  236.0     NA
2020-07-01 23:40:00  236.0     1.9          
2020-07-01 23:30:00  239.0     NA
2020-07-01 23:10:00  184.0     NA
2020-07-01 23:00:00  178.0     NA
2020-07-01 22:50:00  197.0     1.1      
2020-07-01 22:40:00  NA        1.8
2020-07-01 22:30:00  NA        1.4
2020-07-01 22:20:00  172.0     1.4
2020-07-01 22:00:00  170.0     NA 
  • I also would like to have the date starting with an increasing time stamp.

I appreciate your help!

CodePudding user response:

There are functions called pivot_wider and pivot_longer from the tidyr package you can use. T and Z are part of ISO 8601 so you can easily transform them into a object of class datetime:

library(tidyverse)

data <- tribble(
  ~date, ~parameter, ~value,
  
  "2020-07-01T23:50:00Z ", "wind_dir",    236.0,
  "2020-07-01T23:40:00Z "  , "wind_dir",    236.0,
  "2020-07-01T23:40:00Z"  ,  "wind_speed",  1.9
)
data
#> # A tibble: 3 x 3
#>   date                    parameter  value
#>   <chr>                   <chr>      <dbl>
#> 1 "2020-07-01T23:50:00Z " wind_dir   236  
#> 2 "2020-07-01T23:40:00Z " wind_dir   236  
#> 3 "2020-07-01T23:40:00Z"  wind_speed   1.9

data %>%
  mutate(date = date %>% parse_datetime()) %>%
  pivot_wider(names_from = parameter, values_from = value)
#> # A tibble: 2 x 3
#>   date                wind_dir wind_speed
#>   <dttm>                 <dbl>      <dbl>
#> 1 2020-07-01 23:50:00      236       NA  
#> 2 2020-07-01 23:40:00      236        1.9

Created on 2021-10-18 by the reprex package (v2.0.1)

CodePudding user response:

The T and Z removals can be done using base R gsub(). The rest can be done using pivot_wider() from the tidyr package:

raw <- read.table(text = "date                    parameter   value
2020-07-01T23:50:00Z    wind_dir    236.0
2020-07-01T23:40:00Z    wind_dir    236.0
2020-07-01T23:40:00Z    wind_speed  1.9
2020-07-01T23:30:00Z    wind_dir    239.0
2020-07-01T23:10:00Z    wind_dir    184.0
2020-07-01T23:00:00Z    wind_dir    178.0
2020-07-01T22:50:00Z    wind_speed  1.1
2020-07-01T22:50:00Z    wind_dir    197.0
2020-07-01T22:40:00Z    wind_speed  1.8
2020-07-01T22:30:00Z    wind_speed  1.4
2020-07-01T22:20:00Z    wind_dir    172.0
2020-07-01T22:20:00Z    wind_speed  1.4
2020-07-01T22:00:00Z    wind_dir    170.0", header = TRUE)

raw$date <- trimws(gsub("[TZ]", " ", raw$date))


library(tidyr)
packageVersion("tidyr")
#> [1] '1.1.2'

raw <- pivot_wider(raw, 
                   names_from = "parameter", 
                   values_from = "value")

raw
#> # A tibble: 10 x 3
#>    date                wind_dir wind_speed
#>    <chr>                  <dbl>      <dbl>
#>  1 2020-07-01 23:50:00      236       NA  
#>  2 2020-07-01 23:40:00      236        1.9
#>  3 2020-07-01 23:30:00      239       NA  
#>  4 2020-07-01 23:10:00      184       NA  
#>  5 2020-07-01 23:00:00      178       NA  
#>  6 2020-07-01 22:50:00      197        1.1
#>  7 2020-07-01 22:40:00       NA        1.8
#>  8 2020-07-01 22:30:00       NA        1.4
#>  9 2020-07-01 22:20:00      172        1.4
#> 10 2020-07-01 22:00:00      170       NA

Created on 2021-10-18 by the reprex package (v2.0.0)

CodePudding user response:

You may try

library(dplyr)

df %>%
      mutate(date  =gsub("T|Z", " ", date)) %>%
      pivot_wider(names_from = parameter, values_from = value)

   date                   wind_dir wind_speed
   <chr>                     <dbl>      <dbl>
 1 "2020-07-01 23:50:00 "      236       NA  
 2 "2020-07-01 23:40:00 "      236        1.9
 3 "2020-07-01 23:30:00 "      239       NA  
 4 "2020-07-01 23:10:00 "      184       NA  
 5 "2020-07-01 23:00:00 "      178       NA  
 6 "2020-07-01 22:50:00 "      197        1.1
 7 "2020-07-01 22:40:00 "       NA        1.8
 8 "2020-07-01 22:30:00 "       NA        1.4
 9 "2020-07-01 22:20:00 "      172        1.4
10 "2020-07-01 22:00:00 "      170       NA 
  • Related