How to create new variables based on stacked values in columns in R-CodePudding

I have a data set that looks like this

ID   a b c  d   source file

1    3 4 7  23  feb2010.txt
2    2 1 2  47  feb2010.txt
1    3 4 7  26  march2010.txt
2    2 1 2  33  march2010.txt
1    3 4 7  28  april2010.txt
2    2 1 2  32  april2010.txt

I'd like the column names to read

ID  a b c Feb10 March10 April10
1   3 4 7   23    26       28
2   2 1 2   47    33       32

My actual data set has more than just 2 unique ids. It has thousands of unique ids. Any help as to how to change this would be very much appreciated as most of the code I have tried hasn't worked yet.

Thank you!

CodePudding user response：

You can use pivot_wider() from the tidyverse.

library(dplyr)
library(tidyr)
library(lubridate)

df %>% 
  mutate(source_file = tools::file_path_sans_ext(source_file),
         source_file = format(my(source_file), format = "%b%y")) %>% 
  pivot_wider(names_from = "source_file", values_from = "d")

Which gives you the following:

# A tibble: 2 x 7
     ID     a     b     c Feb10 Mar10 Apr10
  <int> <int> <int> <int> <int> <int> <int>
1     1     3     4     7    23    26    28
2     2     2     1     2    47    33    32

Data:

df <- read.table(textConnection("ID   a b c  d   source_file
1    3 4 7  23  feb2010.txt
2    2 1 2  47  feb2010.txt
1    3 4 7  26  march2010.txt
2    2 1 2  33  march2010.txt
1    3 4 7  28  april2010.txt
2    2 1 2  32  april2010.txt"), header = TRUE)

CodePudding user response：

Using dcast from data.table

library(data.table)
dcast(setDT(df1), ID   a   b   c ~ trimws(source_file,
     whitespace = "\\.txt"), value.var = "d")
Key: <ID, a, b, c>
      ID     a     b     c april2010 feb2010 march2010
   <int> <int> <int> <int>     <int>   <int>     <int>
1:     1     3     4     7        28      23        26
2:     2     2     1     2        32      47        33