Home > Software engineering >  Retain column names after split in r
Retain column names after split in r

Time:06-29

I have a data structure as below. As can be seen there are some columns that have numbers separated by a colon. For these, I'd like to retain only the maximum value. Example, for record 1 the expected output is 15 and for record 5 it should be 142. I considered using split function, however, this will create additional columns which I wouldn't want as I need to retain the column structure.

dat <- structure(list(X1 = list(1:15), X2 = list(106L), X3 = list(134L), 
    X4 = list(139L), X5 = list(141:142)), class = "data.frame", row.names = c(NA, 
-1L))

Expected output

X1  X2  X3  X4 X5
15 106 134 139 142

CodePudding user response:

You may use

sapply(dat, function (x) max(unlist(x)))
# X1  X2  X3  X4  X5 
# 15 106 134 139 142

sapply returns a named vector in this case. If you want a data frame, we can do

data.frame(lapply(dat, function (x) max(unlist(x))))
#  X1  X2  X3  X4  X5
#1 15 106 134 139 142

The printing style of a named vector and a 1-row data frame are quite similar, aren't they.


Although this question has been solved, I would like to point out that your dat is not arranged in efficient storage. It is quite uncommon for a data frame column to be a list. Using a list of vectors is more convenient for subsequent operations.

lst <- list(X1 = 1:15, X2 = 106L, X3 = 134L, X4 = 139L, X5 = 141:142)

sapply(lst, max)
# X1  X2  X3  X4  X5 
# 15 106 134 139 142

data.frame(lapply(lst, max))
#  X1  X2  X3  X4  X5
#1 15 106 134 139 142

CodePudding user response:

With tidyverse

library(tidyverse)

df %>%   
  rowwise() %>% 
  summarise(across(everything(), max))

# A tibble: 1 × 5
     X1    X2    X3    X4    X5
  <int> <int> <int> <int> <int>
1    15   106   134   139   142
  •  Tags:  
  • r
  • Related