I'm using R to create a data frame with several duplicate columns. I'd like to consolidate all of the duplicated columns into a single column. How to do this in R?
Note 1 : When I build dataframes with multiple columns, R adds numbers to the names of the repeated columns by default.
Note 2: I am looking for code that will work with the columns regardless of their order.
code:
# Create the data frame.
emp.data <- data.frame(
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
salary = c(700.3,600.2,721.0,730.5,845.4),
emp_name = c("Kevin","Tracy","Thompson","Peter","Bevan"),
stringsAsFactors = FALSE
)
# Print the data frame.
print(emp.data)
current result
emp_name salary salary.1 emp_name.1
Rick 623.3 700.3 Kevin
Dan 515.20 600.2 Tracey
Michelle 611.00 721.0 Thompson
Ryan 729.00 730.5 Peter
Gary 843.25 845.4 Bevan
Expected output
emp_name salary
Rick 623.3
Dan 515.20
Michelle 611.00
Ryan 729.00
Gary 843.25
Kevin 700.3
Tracey 600.2
Thompson 721.0
Peter 730.5
Bevan 845.4
CodePudding user response:
You may use split.default
to split data into list of dataframes based on it's column names. unlist
each dataframe into vector and then you can create a single column from it.
data.frame(lapply(split.default(emp.data, names(emp.data)), unlist), row.names = NULL)
# emp_name salary
#1 Rick 623.30
#2 Dan 515.20
#3 Michelle 611.00
#4 Ryan 729.00
#5 Gary 843.25
#6 Kevin 700.30
#7 Tracy 600.20
#8 Thompson 721.00
#9 Peter 730.50
#10 Bevan 845.40
Another option using pivot_longer
from tidyr
-
tidyr::pivot_longer(emp.data, cols = everything(), names_to = '.value')
data
To create dataframe with the same column names you can add check.names = FALSE
in data.frame
call.
emp.data <- data.frame(
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
salary = c(700.3,600.2,721.0,730.5,845.4),
emp_name = c("Kevin","Tracy","Thompson","Peter","Bevan"),
stringsAsFactors = FALSE, check.names = FALSE
)
CodePudding user response:
I'm not a huge fan of this solution but it works and should scale.
emp.data %>%
pivot_longer(cols = contains("emp_name"), names_to = "ename", values_to = "emp_name") %>%
pivot_longer(cols = contains("salary"), names_to = "sname", values_to = "salary") %>%
filter(replace_na(parse_number(ename),0) == replace_na(parse_number(sname), 0)) %>%
select(-ename, -sname)
NB requires tidyr
for parse_number