Transpose a data frame and add column names as variables in R-CodePudding

I have a time series data frame that is wide and has individual stock data as the column names. I would like to turn this data frame into long format without taking away the ability to see what ticker the data belongs to.

Here is the data below.

df = structure(list(Date = structure(c(1607922000, 1608008400), class = c("POSIXct", 
"POSIXt"), tzone = ""), AAPL.Close = c(0.32982465, 0.34001608
), MSFT.Close = c(0.26307234, 0.27235893), GS.Close = c(0.30742572, 
0.29825025), QQQ.Close = c(0.25350002, 0.24456267)), row.names = 1:2, class = "data.frame")


  Date        AAPL.Close MSFT.Close GS.Close QQQ.Close
1 2020-12-14  0.3298246  0.2630723 0.3074257 0.2535000
2 2020-12-15  0.3400161  0.2723589 0.2982502 0.2445627

I would like the new data frame to look like this.

Date          Data     Ticker
2020-12-14   .3298     AAPL 
2020-12-15   .3400     AAPL
2020-12-14   .260      MSFT
2020-12-15   .27       MSFT
.
.

Thank you for your help

CodePudding user response：

We could use pivot_longer

library(tidyr)
library(dplyr)
df %>% 
   mutate(Date = as.Date(Date)) %>%
   pivot_longer(cols = -Date, names_to = c("Ticker", ".value"), 
       names_sep = "\\.") %>% 
   rename(Data = Close)

CodePudding user response：

library(data.table)
library(magrittr)

setDT(df)

melt(data = df, id.vars = "Date") %>% 
  .[, variable := gsub(pattern = "(. )\\.(. )", replacement = "\\1", x = variable)] %>% 
  .[]


                  Date variable     value
1: 2020-12-14 08:00:00     AAPL 0.3298246
2: 2020-12-15 08:00:00     AAPL 0.3400161
3: 2020-12-14 08:00:00     MSFT 0.2630723
4: 2020-12-15 08:00:00     MSFT 0.2723589
5: 2020-12-14 08:00:00       GS 0.3074257
6: 2020-12-15 08:00:00       GS 0.2982502
7: 2020-12-14 08:00:00      QQQ 0.2535000
8: 2020-12-15 08:00:00      QQQ 0.2445627

CodePudding user response：

And as many times in the last days I provide an data.table attempt additionally to the great akrun solutions :-)

library(data.table)
library(stringr)
setDT(df)

df = melt(df, id.vars=c("Date"), variable.name = "Data", value.name="Ticker")
df[, Data:=str_replace(Data, ".Close", "")]

Output:

> df
                  Date Data    Ticker
1: 2020-12-14 05:00:00 AAPL 0.3298246
2: 2020-12-15 05:00:00 AAPL 0.3400161
3: 2020-12-14 05:00:00 MSFT 0.2630723
4: 2020-12-15 05:00:00 MSFT 0.2723589
5: 2020-12-14 05:00:00   GS 0.3074257
6: 2020-12-15 05:00:00   GS 0.2982502
7: 2020-12-14 05:00:00  QQQ 0.2535000
8: 2020-12-15 05:00:00  QQQ 0.2445627

Update, if you just want to group Dates instead of Datetimes:

df = melt(df[, Date:=as.Date(Date)], id.vars=c("Date"), variable.name = "Data", value.name="Ticker")