In R, how do I selectively 'copy and paste' a cell into another cell based on specific row-CodePudding

I have some data (see below) whereby I have participants (ID column) who have a score on three variables (Name_A, Name_B, and Name_C). These scores are currently recorded against the associated variable horizontally in the X1Score, X2Score, and X3Score columns. I would like these scores 'copied and pasted' (for lack of a better phrase) to the associated columns – being Name_A, Name_B, and Name_C (currently filled with NA) – so that I have the data in long format. How do I do this?

ID  X1      X1Score X2      X2Score X3      X3Score Name_A    Name_B  Name_C
1   Name_A  4.58    Name_C  4.79    Name_B  5.22    NA        NA      NA
2   Name_C  5.35    Name_B  5.33    Name_A  5.61    NA        NA      NA
3   Name_B  5.59    Name_C  5.48    Name_A  4.89    NA        NA      NA
4   Name_C  5.36    Name_B  5.04    Name_A  4.93    NA        NA      NA
5   Name_A  5.39    Name_B  5.27    Name_C  5.11    NA        NA      NA
6   Name_C  4.91    Name_A  4.99    Name_B  5.01    NA        NA      NA



df <- structure(list(ID = 1:6,
                         X1 = c("Name_A", "Name_C", "Name_B", "Name_C", "Name_A", "Name_C"),
                         X1Score = c(4.58, 5.35, 5.59, 5.36, 5.39, 4.91),
                         X2 = c("Name_C", "Name_B", "Name_C", "Name_B", "Name_B", "Name_A"),
                         X2Score = c(4.79, 5.33, 5.48, 5.04, 5.27, 4.99),
                         X3 = c("Name_B", "Name_A", "Name_A", "Name_A", "Name_C", "Name_B"),
                         X3Score = c(5.22, 5.61, 4.89, 4.93, 5.11, 5.01),
                         Name_A = c(NA, NA, NA, NA, NA, NA),
                         Name_B = c(NA, NA, NA, NA, NA, NA),
                         Name_C = c(NA, NA, NA, NA, NA, NA)),
                    row.names = c(NA, -6L), class = "data.frame")

#Edit: My original request above is too simplistic and, although the answer technically addressed the question, I failed to understand how to generalise it. So, here is a revised example (where the only major difference is the naming convention of the columns) - this example produces an error, despite the same code working on the above example. My hope is that with another example of my problem, I will be able to make sense of the 'X\\d (.*)' line as it looks like this is the key to making it work. Here is the updated example:

df <- structure(list(ID = 1:6,
                     X1_Name = c("Name_A", "Name_C", "Name_B", "Name_C", "Name_A", "Name_C"),
                     X1_Score = c(4.58, 5.35, 5.59, 5.36, 5.39, 4.91),
                     X5_Name = c("Name_C", "Name_B", "Name_C", "Name_B", "Name_B", "Name_A"),
                     X5_Score = c(4.79, 5.33, 5.48, 5.04, 5.27, 4.99),
                     X19_Name = c("Name_B", "Name_A", "Name_A", "Name_A", "Name_C", "Name_B"),
                     X19_Score = c(5.22, 5.61, 4.89, 4.93, 5.11, 5.01)),
                row.names = c(NA, -6L), class = "data.frame")

df %>%
  #get the data in long format creating two columns Name and Score
  pivot_longer(cols = -ID, 
               names_to = '.value',
               names_pattern = 'X\\d (.*)') %>%
  #Get data in wide format. 
  pivot_wider(names_from = Name, values_from = Score)

CodePudding user response：

You may perform reshaping with pivot_longer/pivot_wider -

library(dplyr)
library(tidyr)

df %>%
  #To drop empty NA columns
  select(-starts_with('Name')) %>%
  #Rename X1 to X1Name, X2 to X2Name and so on
  rename_with(~paste0(., 'Name'), matches('^X\\d $')) %>%
  #get the data in long format creating two columns Name and Score
  pivot_longer(cols = -ID, 
               names_to = '.value',
               names_pattern = 'X\\d (.*)') %>%
  #Get data in wide format. 
  pivot_wider(names_from = Name, values_from = Score)

#     ID Name_A Name_C Name_B
#  <int>  <dbl>  <dbl>  <dbl>
#1     1   4.58   4.79   5.22
#2     2   5.61   5.35   5.33
#3     3   4.89   5.48   5.59
#4     4   4.93   5.36   5.04
#5     5   5.39   5.11   5.27
#6     6   4.99   4.91   5.01

If you want to keep all other columns in the data as it is and add these 3 columns separately you may join the dataset with the original one.

...Code from above %>%
   left_join(df %>% select(-starts_with('Name')), by = 'ID')