Home > Mobile >  How to merge rows into single column by grouping the same variable in R language
How to merge rows into single column by grouping the same variable in R language

Time:12-09

I have a data frame

type   function   class
A      AXX        AYY
A      AZZ        AUU
B      BXX        BYY
B      BUU        BHH 

I want to transform them into

type   function   class   type    function  class  
A      AXX        AYY     A      AZZ        AUU
B      BXX        BYY     B      BUU        BHH 

I did try to use dcast and melt but didn't workout for me, I am new to R, please help

CodePudding user response:

Here is another proposition:

library(dplyr)
library(tidyr)
library(purrr)

df <- data.frame(
  stringsAsFactors = FALSE,
       check.names = FALSE,
              type = c("A", "A", "B", "B"),
        `function` = c("AXX", "AZZ", "BXX", "BUU"),
             class = c("AYY", "AUU", "BYY", "BHH")
      )

df <- df %>% 
  group_by(type) %>% 
  mutate(id = row_number())

df <- split(df, df$id)
df <- map(df, select, -id)
df <- reduce(df, cbind)
names(df) <- gsub("[.]*\\d$", "", names(df))

However, I'm afraid having columns with the same name can be problematic in future.

CodePudding user response:

Is this what you're expecting ?

library(dplyr)

my_df <- data.frame("type" = c("A", "A", "B", "B", "C"),
                "function1" = c("AXX", "AZZ", "BXX", "BUU", "CCC"),
                "class1" = c("AYY", "AUU", "BYY", "BHH", "CCC"),
                stringsAsFactors = FALSE)

my_df <- my_df %>% group_by(type) %>% mutate(My_id = cur_group_id())
my_base <- my_df %>% group_by(type) %>% filter(row_number() == 1)
my_other <- my_df %>% group_by(type) %>% filter(row_number() != 1)
my_base <- left_join(x = my_base, y = my_other, by = "My_id")
colnames(my_base) <- gsub(pattern = "\\.x$|\\.y$", replacement = "", x = colnames(my_base))
my_base <- my_base[, -which(colnames(my_base) == "My_id")]

CodePudding user response:

I think all the (yet) proposed solutions do only work if the type elements are only once or twice in the data.frame. I'm not sure if this is always the case in your data, therefore I added an if-condition to my solution.

result = NULL
dataCount <- max(as.data.frame(table(data$type))[,2])
if (dataCount <= 2){
  data1 <- data[duplicated(data$type),]
  data2 <- data[!duplicated(data$type),]
  result <- merge(data2, data1, by="type", all.x=T)
} 
>result
      type function..x class.x function..y class.y
    1    A         AXX     AYY         AZZ     AUU
    2    B         BXX     BYY         BUU     BHH

If you want to create then the data.frame you've asked for you can simply do:

result$type.y <- result$type
result <- result[,c(1,2,3,6,4,5)]
names(result) <- unlist(lapply(1:ncol(result), function(x) {strsplit(names(result)[x], "[.]")[[1]][1] }))

> result
  type function class type function class
1    A      AXX   AYY    A      AZZ   AUU
2    B      BXX   BYY    B      BUU   BHH

However, in general, I would recommend avoiding using function as a column name (e.g. use function2use instead) because it is a build-in name in R, and use the naming from the merge output e.g. class.x and class.y instead of using the same column name twice.

  • Related