I have a dataframe in R with lots of columns and lots of rows with numeric values. I have multiple colums with the same name (e.g. A, A, B, B, C, C...) that should be combined into a single colum with the maximum of the rows (A,B,C)
So this
A | A | B | B | C | C | ... |
---|---|---|---|---|---|---|
0 | 2 | 1 | 3 | 1 | 1 | ... |
3 | 4 | 2 | 5 | 1 | 1 | ... |
... | ... | ... | ... | ... | ... | ... |
Should be combined to
A | B | C | ... |
---|---|---|---|
2 | 3 | 1 | ... |
4 | 5 | 1 | ... |
... |
Can somebody help me with that?
I found an example online that is used to summarize row values of columns with the same name
df<-
data.frame(x=rpois(25,1),y=rpois(25,2),x=rpois(25,5),z=rpois(25,2),y=rpois(25,1),z=rpoi
s(25,5),check.names=FALSE)
Merged_df<-as.data.frame(do.call(cbind,
by(t(df),INDICES=names(df),FUN=colSums)))
Merged_df
and replaced FUN=colSums
it with
FUN = function(x) max(x,na.rm = TRUE)
but that resulted in an error
Error in tapply(seq_len(3635L), list(INDICES = NULL), function (x) :
arguments must have same length
Now i am not sure if this is a problem in my data or in the function i was using as i am not experienced with R.
CodePudding user response:
You can use split.default
to split the columns according to their names, and then get the rowwise max with do.call
and pmax
, before binding the rows.
dat <- data.frame(A = 1:2, A = c(0, 3), B = 3:4, B = 0:1, check.names = F)
# A A B B
# 1 1 0 3 0
# 2 2 3 4 1
split.default(dat, colnames(dat)) |>
lapply(do.call, what = "pmax") |>
rbind.data.frame()
# A B
# 1 1 3
# 2 3 4