I have some data (see below) whereby I have participants (ID
column) who have a score on three variables (Name_A
, Name_B
, and Name_C
). These scores are currently recorded against the associated variable horizontally in the X1Score
, X2Score
, and X3Score
columns. I would like these scores 'copied and pasted' (for lack of a better phrase) to the associated columns – being Name_A
, Name_B
, and Name_C
(currently filled with NA
) – so that I have the data in long format. How do I do this?
ID X1 X1Score X2 X2Score X3 X3Score Name_A Name_B Name_C
1 Name_A 4.58 Name_C 4.79 Name_B 5.22 NA NA NA
2 Name_C 5.35 Name_B 5.33 Name_A 5.61 NA NA NA
3 Name_B 5.59 Name_C 5.48 Name_A 4.89 NA NA NA
4 Name_C 5.36 Name_B 5.04 Name_A 4.93 NA NA NA
5 Name_A 5.39 Name_B 5.27 Name_C 5.11 NA NA NA
6 Name_C 4.91 Name_A 4.99 Name_B 5.01 NA NA NA
df <- structure(list(ID = 1:6,
X1 = c("Name_A", "Name_C", "Name_B", "Name_C", "Name_A", "Name_C"),
X1Score = c(4.58, 5.35, 5.59, 5.36, 5.39, 4.91),
X2 = c("Name_C", "Name_B", "Name_C", "Name_B", "Name_B", "Name_A"),
X2Score = c(4.79, 5.33, 5.48, 5.04, 5.27, 4.99),
X3 = c("Name_B", "Name_A", "Name_A", "Name_A", "Name_C", "Name_B"),
X3Score = c(5.22, 5.61, 4.89, 4.93, 5.11, 5.01),
Name_A = c(NA, NA, NA, NA, NA, NA),
Name_B = c(NA, NA, NA, NA, NA, NA),
Name_C = c(NA, NA, NA, NA, NA, NA)),
row.names = c(NA, -6L), class = "data.frame")
#Edit: My original request above is too simplistic and, although the answer technically addressed the question, I failed to understand how to generalise it. So, here is a revised example (where the only major difference is the naming convention of the columns) - this example produces an error, despite the same code working on the above example. My hope is that with another example of my problem, I will be able to make sense of the 'X\\d (.*)'
line as it looks like this is the key to making it work. Here is the updated example:
df <- structure(list(ID = 1:6,
X1_Name = c("Name_A", "Name_C", "Name_B", "Name_C", "Name_A", "Name_C"),
X1_Score = c(4.58, 5.35, 5.59, 5.36, 5.39, 4.91),
X5_Name = c("Name_C", "Name_B", "Name_C", "Name_B", "Name_B", "Name_A"),
X5_Score = c(4.79, 5.33, 5.48, 5.04, 5.27, 4.99),
X19_Name = c("Name_B", "Name_A", "Name_A", "Name_A", "Name_C", "Name_B"),
X19_Score = c(5.22, 5.61, 4.89, 4.93, 5.11, 5.01)),
row.names = c(NA, -6L), class = "data.frame")
df %>%
#get the data in long format creating two columns Name and Score
pivot_longer(cols = -ID,
names_to = '.value',
names_pattern = 'X\\d (.*)') %>%
#Get data in wide format.
pivot_wider(names_from = Name, values_from = Score)
CodePudding user response:
You may perform reshaping with pivot_longer
/pivot_wider
-
library(dplyr)
library(tidyr)
df %>%
#To drop empty NA columns
select(-starts_with('Name')) %>%
#Rename X1 to X1Name, X2 to X2Name and so on
rename_with(~paste0(., 'Name'), matches('^X\\d $')) %>%
#get the data in long format creating two columns Name and Score
pivot_longer(cols = -ID,
names_to = '.value',
names_pattern = 'X\\d (.*)') %>%
#Get data in wide format.
pivot_wider(names_from = Name, values_from = Score)
# ID Name_A Name_C Name_B
# <int> <dbl> <dbl> <dbl>
#1 1 4.58 4.79 5.22
#2 2 5.61 5.35 5.33
#3 3 4.89 5.48 5.59
#4 4 4.93 5.36 5.04
#5 5 5.39 5.11 5.27
#6 6 4.99 4.91 5.01
If you want to keep all other columns in the data as it is and add these 3 columns separately you may join the dataset with the original one.
...Code from above %>%
left_join(df %>% select(-starts_with('Name')), by = 'ID')