Home > front end >  Reshape from unconstructed dataset in r
Reshape from unconstructed dataset in r

Time:09-23

I am trying to reshape a dataset by switching some cells information. Here is how my sample dataset looks like.

data <- data.frame(var1 = c("Text","A","B","C","D"),
               var2 = c("Text",NA, 1,0,1),
               var3 = c("112-1",NA,NA,"text",NA),
               var4 = c("Text",1,0,NA, NA),
               var5 = c("113-1",NA,"text",NA,NA))
> data
  var1 var2  var3 var4  var5
1 Text Text 112-1 Text 113-1
2    A <NA>  <NA>    1  <NA>
3    B    1  <NA>    0  text
4    C    0  text <NA>  <NA>
5    D    1  <NA> <NA>  <NA>

It needs some cleaning first.var1 has the item information. var2 and var4 have score information. var3 and var5 have id information at the first row. I will need to reshape this dataset as below.

> data.1
   id  A B  C  D
1 112 NA 1  0  1
2 113  1 0 NA NA

Considering this datafile in multiple columns (e.g. having more columns var6,var7,var8,var9,.etc) with the same pattern, How can I reshape to this desired dataset?

CodePudding user response:

This isn't much different from my answer yesterday, but this will give you the result you asked for. Shift that first row over one column so that the id is on the same column with the needed values, remove the unnecessary columns, then make row one the column names. Add some pivots and then it should be roughly what you need:

data <- data.frame(var1 = c("Text","A","B","C","D"), var2 = c("Text",NA, 1,0,1), var3 = c("112",NA,NA,NA,NA), var4 = c("Text",1,0,NA, NA), var5 = c(113,NA,NA,NA,NA))

library(dplyr)
library(tidyr)

data2<-data%>%
  mutate_all(as.character) #Making character to avoid factor issues

data2[1, 2:(ncol(data2) - 1)] <- data2[1, 3:ncol(data2)] #Shifting first row over one column

data3<-data2%>%
  select(-var3,-var5) #Removing the uneeded columns

colnames(data3) <- data3[1,] #Taking the first row and making it the column names
data3 <- data3[-1, ] #removing row 1, since it was made into column names


data3%>%
  tidyr::pivot_longer(-Text, names_to = "id", values_to = "time")%>% #Making the data into longer format
  tidyr::pivot_wider(names_from = Text, values_from = time) #Then back into wide

CodePudding user response:

You could shift the first row, delete, columns %% 2 and transpose.

data[1, ] <- data[1, -1]
data <- data[c(TRUE, seq_len(ncol(data))[-1] %% 2 == 0)]
setNames(as.data.frame(t(data[, -1]), row.names=FALSE), c('id', data[[1]][-1])) |>
  type.convert(as.is=TRUE)
#      id  A B  C  D
# 1 112-1 NA 1  0  1
# 2 113-1  1 0 NA NA

BTW, how do you get such data? Maybe you have an x-y-problem.

CodePudding user response:

library(dplyr)
library(tidyr)
library(stringr)

#First rename the columns to more appropriate
n = 2 #Number of pairs of columns you have (here 2)
nam <- do.call(paste0, (expand.grid(c("n_", "id_"), seq(n))))
colnames(data) <- c("col", nam)

#Then, the data manipulation
data %>% 
  mutate(across(starts_with("id"), ~ first(str_remove(.x, "-")))) %>% 
  fill(starts_with("id")) %>% 
  slice(-1) %>% 
  pivot_longer(-col, names_to = c(".value", "rn"), names_sep = "_") %>% 
  pivot_wider(names_from = "col", values_from = 'n') %>% 
  select(-rn)
  id    A     B     C     D    
1 1121  NA    1     0     1    
2 1131  1     0     NA    NA   
  • Related