Home > other >  R: Change column name based on string in row value
R: Change column name based on string in row value

Time:12-20

I have this df that comes from a massive csv file. The current column names have placeholders for their names. Is there a way to make the name of each column the sting part of the data. The string will be the same for the entire column going down. I would change the names manually, but there are 256 columns.

Starting df:

   col2          col3       col4     ...
1 version=1    numCh=10   reserved=0
2 version=1    numCh=10   reserved=0
3 version=1    numCh=11   reserved=0
4 version=1    numCh=11   reserved=0
...

The desired output:

  version    numCh    reserved   ...
1    1        10        0
2    1        10        0
3    1        11        0
4    1        11        0
...

Been trying different methods and seeing if anybody else has created something like this.

CodePudding user response:

Extract the first row, and remove the substring from = with trimws, then loop across the columns, extract the numeric part with parse_number and set the column names with the substring of first row

library(dplyr)
 v1 <- trimws(unlist(df1[1,]), whitespace = "=.*")
df1 %>% 
    mutate(across(everything(), readr::parse_number)) %>% 
     setNames(v1)

-output

  version numCh reserved
1       1    10        0
2       1    10        0
3       1    11        0
4       1    11        0

data

df1 <- structure(list(col2 = c("version=1", "version=1", "version=1", 
"version=1"), col3 = c("numCh=10", "numCh=10", "numCh=11", "numCh=11"
), col4 = c("reserved=0", "reserved=0", "reserved=0", "reserved=0"
)), class = "data.frame", row.names = c("1", "2", "3", "4"))

CodePudding user response:

Or with base:

colnames(df) <- trimws(df[1, ], whitespace = "=.*")
df[] <- lapply(df, \(x) as.numeric(trimws(x, whitespace = ".*=")))

Output:

  version numCh reserved
1       1    10        0
2       1    10        0
3       1    11        0
4       1    11        0

Data:

df <- read.table(text = "col2 col3 col4
version=1 numCh=10 reserved=0
version=1 numCh=10 reserved=0
version=1 numCh=11 reserved=0
version=1 numCh=11 reserved=0", header = TRUE)

CodePudding user response:

Here is an alternative approach:

library(dplyr)
library(stringr)

df1 %>% 
  mutate(across(everything(), ~str_extract(., '[0-9] ')))
  col2 col3 col4
1    1   10    0
2    1   10    0
3    1   11    0
4    1   11    0
  • Related