Home > Blockchain >  How to transform dataset from long to wide when there are 3 identifiers?
How to transform dataset from long to wide when there are 3 identifiers?

Time:10-17

I have a dataframe with three identifiers and multiple variables. As an example, say I have a variable that identifies a country, another that identifies a person, a third that identifies time, and two more variables var1 and var2. Like the data structure below:

set.seed(123)
df1 <- data.frame(country = LETTERS[1:2])
df2 <- data.frame(person = letters[1:3])
df3 <- data.frame(time = 1:3)

df <- merge(df1, df2)
df <- merge(df, df3)

df <- df %>% mutate(var1 = runif(18),
                    var2 = runif(18))

df <- df %>% arrange(country, person, time)

which is going to end up with a data frame like:

   country person time       var1       var2
1        A      a    1 0.28757752 0.32792072
2        A      a    2 0.52810549 0.65570580
3        A      a    3 0.67757064 0.96302423
4        A      b    1 0.40897692 0.88953932
5        A      b    2 0.55143501 0.54406602
6        A      b    3 0.10292468 0.69070528
7        A      c    1 0.94046728 0.64050681
8        A      c    2 0.95683335 0.28915974
9        A      c    3 0.24608773 0.02461368
10       B      a    1 0.78830514 0.95450365
11       B      a    2 0.89241904 0.70853047
12       B      a    3 0.57263340 0.90229905
13       B      b    1 0.88301740 0.69280341
14       B      b    2 0.45661474 0.59414202
15       B      b    3 0.89982497 0.79546742
16       B      c    1 0.04555650 0.99426978
17       B      c    2 0.45333416 0.14711365
18       B      c    3 0.04205953 0.47779597

I would like to transform this dataset so I have a column identifying each person for each var1 and var2. In the dataset, I have 3 persons, a,b, and c. So each row would correspond to a country x year, and I would end up with 6 columns: var1_a, var1_b, var1_c, var2_a, var2_b, var2_c. What is the best way to accomplish this?

CodePudding user response:

Here a solution:

library(tidyverse)
pivot_wider(df, id_cols = c(country, time), names_from=person, values_from=c(var1,var2))
  • Related