Home > Back-end >  Find and rename columns in a set of data frames
Find and rename columns in a set of data frames

Time:03-16

I have 4 data.frames (d1,...d4). Some of these data.frames have more columns than others some less.

I wonder how to automatically find any of the colnames: "emmean","SE","t.ratio","df","p.value","lower.CL","upper.CL" and then rename them to: "Mean","SE","t","Df","p-value","Lower","Upper", respectively?

ps. We leave all other columns untouched. Note that I don't have a list of data.frames. I may get only one of the data.frames below.

d1="
training_hr time      emmean    SE     df lower.CL upper.CL t.ratio p.value
  23.145    Baseline     NA    NA     NA       NA       NA      NA      NA
  23.145    Post-test1  0.428 0.134 70.000    0.161    0.695   3.199   0.002
  23.145    Post-test2    NA    NA     NA       NA       NA      NA      NA"
d1 = read.table(text=d1,h=T)


d2="
  training_hr        time emmean    SE     df lower.CL upper.CL t.ratio
      23.145    Baseline     NA    NA     NA       NA       NA      NA
      23.145 Post-test1  0.428 0.134 70.000    0.161    0.695   3.199
      23.145 Post-test2     NA    NA     NA       NA       NA      NA"
d2= read.table(text=d2,h=T)

d3="
  training_hr        time emmean    SE     df  t.ratio p.value
      23.145    Baseline     NA    NA     NA  NA      NA
      23.145 Post-test1  0.428 0.134 70.000  3.199   0.002
      23.145 Post-test2     NA    NA     NA  NA      NA"
d3=read.table(text=d3,h=T)

d4="
  training_hr   emmean    SE     df     t.ratio   p.value
      23.145   NA        NA     NA     NA        NA
      23.145   0.428     0.134 70.000  3.199     0.002
      23.145   NA        NA     NA     NA        NA"
d4=read.table(text=d4,h=T)

CodePudding user response:

Data Table

You can use setnames from data.table, which takes the old and new names as separate lists. Whereas with tidyverse, you have to give the old and new names together (e.g., Mean = "emmean"). But note setnames will change your original dataframe (changes them by reference).

library(data.table)

setnames(df1, old = c("emmean","SE","t.ratio","df","p.value","lower.CL","upper.CL"),
                          new = c("Mean","SE","t","Df","p-value","Lower","Upper"), skip_absent=TRUE)

Output

d1

  training_hr       time  Mean    SE Df Lower Upper     t p-value
1      23.145   Baseline    NA    NA NA    NA    NA    NA      NA
2      23.145 Post-test1 0.428 0.134 70 0.161 0.695 3.199   0.002
3      23.145 Post-test2    NA    NA NA    NA    NA    NA      NA

Tidyverse

With tidyverse, you can create an external vector with all of the possible name changes, then use any inside rename.

library(tidyverse)

lookup <- c(Mean = "emmean", SE = "SE", t = "t.ratio", DF = "df", "p-value" = "p.value", Lower = "lower.CL", Upper = "upper.CL")

d1 %>%
  rename(any_of(lookup))

#  training_hr       time  Mean    SE DF Lower Upper     t p-value
#1      23.145   Baseline    NA    NA NA    NA    NA    NA      NA
#2      23.145 Post-test1 0.428 0.134 70 0.161 0.695 3.199   0.002
#3      23.145 Post-test2    NA    NA NA    NA    NA    NA      NA

For several dataframes, you could put the dataframes into a list then use a combination of map from purrr and setnames from data.table. This will also update all of the original dataframes by reference (i.e., d1, d2, ...).

library(data.table)
library(tidyverse)

df.list <- list(d1, d2, d3, d4)

map(df.list, ~ setnames(.x, old = c("emmean","SE","t.ratio","df","p.value","lower.CL","upper.CL"),
                          new = c("Mean","SE","t","Df","p-value","Lower","Upper"), skip_absent=TRUE))
  • Related