I have this df that comes from a massive csv file. The current column names have placeholders for their names. Is there a way to make the name of each column the sting part of the data. The string will be the same for the entire column going down. I would change the names manually, but there are 256 columns.
Starting df:
col2 col3 col4 ...
1 version=1 numCh=10 reserved=0
2 version=1 numCh=10 reserved=0
3 version=1 numCh=11 reserved=0
4 version=1 numCh=11 reserved=0
...
The desired output:
version numCh reserved ...
1 1 10 0
2 1 10 0
3 1 11 0
4 1 11 0
...
Been trying different methods and seeing if anybody else has created something like this.
CodePudding user response:
Extract the first row, and remove the substring from =
with trimws
, then loop across
the columns, extract the numeric part with parse_number
and set the column names with the substring of first row
library(dplyr)
v1 <- trimws(unlist(df1[1,]), whitespace = "=.*")
df1 %>%
mutate(across(everything(), readr::parse_number)) %>%
setNames(v1)
-output
version numCh reserved
1 1 10 0
2 1 10 0
3 1 11 0
4 1 11 0
data
df1 <- structure(list(col2 = c("version=1", "version=1", "version=1",
"version=1"), col3 = c("numCh=10", "numCh=10", "numCh=11", "numCh=11"
), col4 = c("reserved=0", "reserved=0", "reserved=0", "reserved=0"
)), class = "data.frame", row.names = c("1", "2", "3", "4"))
CodePudding user response:
Or with base
:
colnames(df) <- trimws(df[1, ], whitespace = "=.*")
df[] <- lapply(df, \(x) as.numeric(trimws(x, whitespace = ".*=")))
Output:
version numCh reserved
1 1 10 0
2 1 10 0
3 1 11 0
4 1 11 0
Data:
df <- read.table(text = "col2 col3 col4
version=1 numCh=10 reserved=0
version=1 numCh=10 reserved=0
version=1 numCh=11 reserved=0
version=1 numCh=11 reserved=0", header = TRUE)
CodePudding user response:
Here is an alternative approach:
library(dplyr)
library(stringr)
df1 %>%
mutate(across(everything(), ~str_extract(., '[0-9] ')))
col2 col3 col4
1 1 10 0
2 1 10 0
3 1 11 0
4 1 11 0