How to combine duplicate rows in R?-CodePudding

I'm using R to create a data frame with several duplicate columns. I'd like to consolidate all of the duplicated columns into a single column. How to do this in R?

Note 1 : When I build dataframes with multiple columns, R adds numbers to the names of the repeated columns by default.

Note 2: I am looking for code that will work with the columns regardless of their order.

code:

# Create the data frame.
emp.data <- data.frame(
  emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
  salary = c(623.3,515.2,611.0,729.0,843.25), 
  salary = c(700.3,600.2,721.0,730.5,845.4), 
  emp_name = c("Kevin","Tracy","Thompson","Peter","Bevan"),
  stringsAsFactors = FALSE
)
# Print the data frame.         
print(emp.data)

current result

         emp_name   salary   salary.1 emp_name.1
         Rick       623.3    700.3    Kevin
         Dan        515.20   600.2    Tracey
         Michelle   611.00   721.0    Thompson
         Ryan       729.00   730.5    Peter
         Gary       843.25   845.4    Bevan

Expected output

       emp_name   salary   
         Rick       623.3    
         Dan        515.20   
         Michelle   611.00   
         Ryan       729.00   
         Gary       843.25   
         Kevin      700.3
         Tracey     600.2
         Thompson   721.0
         Peter      730.5
         Bevan      845.4

CodePudding user response：

You may use split.default to split data into list of dataframes based on it's column names. unlist each dataframe into vector and then you can create a single column from it.

data.frame(lapply(split.default(emp.data, names(emp.data)), unlist), row.names = NULL)

#   emp_name salary
#1      Rick 623.30
#2       Dan 515.20
#3  Michelle 611.00
#4      Ryan 729.00
#5      Gary 843.25
#6     Kevin 700.30
#7     Tracy 600.20
#8  Thompson 721.00
#9     Peter 730.50
#10    Bevan 845.40

Another option using pivot_longer from tidyr -

tidyr::pivot_longer(emp.data, cols = everything(), names_to = '.value')

data

To create dataframe with the same column names you can add check.names = FALSE in data.frame call.

emp.data <- data.frame(
  emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
  salary = c(623.3,515.2,611.0,729.0,843.25), 
  salary = c(700.3,600.2,721.0,730.5,845.4), 
  emp_name = c("Kevin","Tracy","Thompson","Peter","Bevan"),
  stringsAsFactors = FALSE, check.names = FALSE
)

CodePudding user response：

I'm not a huge fan of this solution but it works and should scale.

emp.data %>%
  pivot_longer(cols = contains("emp_name"), names_to = "ename", values_to = "emp_name") %>%
  pivot_longer(cols = contains("salary"), names_to = "sname", values_to  = "salary") %>%
  filter(replace_na(parse_number(ename),0) == replace_na(parse_number(sname), 0)) %>%
  select(-ename, -sname)

NB requires tidyr for parse_number