Home > Software engineering >  How do i change all the column names in a dataframe, from a long string?
How do i change all the column names in a dataframe, from a long string?

Time:11-14

I have a dataset consisting of data in 96 columns. The column names are currently "A-H1" to "A-H12" as seen on the table below:

> head(od,1)
  time T..OD2.600  A1   A2   A3  A4   A5   A6   A7  A8   A9 A10  A11  A12   B1   B2   B3  B4  B5
1 0.24         25 0.1 0.13 0.13 0.1 0.16 0.12 0.13 0.1 0.09 0.1 0.09 0.09 0.09 0.09 0.13 0.2 0.1
   B6  B7   B8   B9  B10  B11  B12   C1   C2   C3   C4  C5   C6  C7  C8   C9  C10  C11  C12  D1
1 0.1 0.1 0.12 0.09 0.09 0.09 0.09 0.09 0.12 0.13 0.11 0.1 0.14 0.1 0.1 0.09 0.09 0.09 0.09 0.1
    D2   D3   D4   D5   D6  D7   D8   D9  D10  D11  D12   E1   E2  E3   E4   E5  E6   E7  E8   E9
1 0.09 0.11 0.09 0.14 0.09 0.1 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.1 0.21 0.12 0.1 0.11 0.1 0.09
   E10  E11 E12  F1  F2  F3  F4   F5   F6   F7   F8   F9  F10  F11  F12   G1  G2   G3   G4   G5
1 0.09 0.09 0.1 0.1 0.1 0.1 0.1 0.09 0.09 0.09 0.09 0.09 0.12 0.09 0.09 0.09 0.1 0.11 0.11 0.09
    G6  G7   G8  G9  G10 G11 G12   H1   H2  H3  H4   H5  H6  H7   H8   H9  H10  H11  H12
1 0.14 0.1 0.09 0.1 0.14 0.1 0.1 0.09 0.09 0.1 0.1 0.09 0.1 0.1 0.09 0.09 0.09 0.09 0.09

However, i need to change all the column names "A1, A2, A3" etc to corresponding name from this the text in the format as seen below:

" A1 Negative Control A2 Ammonia A3 Nitrite A4 Nitrate A5 Urea A6 Biuret A7 L-Alanine A8 L-Arginine A9 L-Asparagine A10 L-Aspartic Acid A11 L-Cysteine A12 L-Glutamic Acid B1 L-Glutamine B2 Glycine B3 L-Histidine B4 L-Isoleucine B5 L-Leucine B6 L-Lysine B7 L-Methionine B8 L-Phenylalanine B9 L-Proline B10 L-Serine B11 L-Threonine B12 L-Tryptophan C1 L-Tyrosine C2 L-Valine C3 D-Alanine C4 D-Asparagine C5 D-Aspartic Acid C6 D-Glutamic Acid C7 D-Lysine C8 D-Serine C9 D-Valine C10 L-Citrulline C11 L-Homoserine C12 L-Ornithine D1 N-Acetyl-LGlutamic Acid D2 N-Phthaloyl-LGlutamic Acid D3 L-Pyroglutamic Acid D4 Hydroxylamine D5 Methylamine D6 N-Amylamine D7 N-Butylamine D8 Ethylamine D9 Ethanolamine D10 Ethylenediamine D11 Putrescine D12 Agmatine E1 Histamine E2 ß-Phenylethylamine E3 Tyramine E4 Acetamide E5 Formamide E6 Glucuronamide E7 D,L-Lactamide E8 D-Glucosamine E9 D-Galactosamine E10 D-Mannosamine E11 N-Acetyl-DGlucosamine E12 N-Acetyl-DGalactosamine F1 N-Acetyl-DMannosamine F2 Adenine F3 Adenosine F4 Cytidine F5 Cytosine F6 Guanine F7 Guanosine F8 Thymine F9 Thymidine F10 Uracil F11 Uridine F12 Inosine G1 Xanthine G2 Xanthosine G3 Uric Acid G4 Alloxan G5 Allantoin G6 Parabanic Acid G7 D,L-α-Amino-NButyric Acid G8 γ-Amino-NButyric Acid G9 ε-Amino-NCaproic Acid G10 D,L-α-AminoCaprylic Acid G11 δ-Amino-NValeric Acid G12 α-Amino-NValeric Acid H1 Ala-Asp H2 Ala-Gln H3 Ala-Glu H4 Ala-Gly H5 Ala-His H6 Ala-Leu H7 Ala-Thr H8 Gly-Asn H9 Gly-Gln H10 Gly-Glu H11 Gly-Met H12 Met-Ala "

So that,

A1 = Negative control A2 = Ammonia

and so on.

I hope everything makes sense, and thanks a lot in advance!

CodePudding user response:

We convert the string to a key/val dataset

key_val <- read.csv(text = sub("(?<=\\d) ", ":",
   strsplit(str1, "\\s (?=[A-Z]\\d \\s)", perl = TRUE)[[1]], perl = TRUE), 
    header = FALSE, sep = ":")
#or without splitting
#key_val <- read.csv(text = gsub("\\s (?=[A-Z]\\d \\b)", "\n",
#   gsub("(?<=\\d)\\s ", ":", str1, perl = TRUE), perl = TRUE), 
#     header = FALSE, sep=":")

-checking

> head(key_val)
  V1               V2
1 A1 Negative Control
2 A2          Ammonia
3 A3          Nitrite
4 A4          Nitrate
5 A5             Urea
6 A6           Biuret
> tail(key_val)
    V1      V2
91  H7 Ala-Thr
92  H8 Gly-Asn
93  H9 Gly-Gln
94 H10 Gly-Glu
95 H11 Gly-Met
96 H12 Met-Ala

Now, we rename by matching the column names of the dataset with the 'V1' column to modify with the 'V2' values

library(dplyr)
library(tibble)
key_val_sub <- key_val %>% 
   filter(V1 %in% names(od))
od1 <- od %>% 
    rename(!!! deframe(key_val_sub[2:1]))

-output

> od1
  time Negative Control Ammonia δ-Amino-NValeric Acid
1 0.24              0.1     0.3                   0.1

NOTE: Just for reproducibility, used only a subset of the OP's 'od' data

data

od <- structure(list(time = 0.24, A1 = 0.1, A2 = 0.3, G11 = 0.1), class = "data.frame", row.names = c(NA, 
-1L))

str1 <- "A1 Negative Control A2 Ammonia A3 Nitrite A4 Nitrate A5 Urea A6 Biuret A7 L-Alanine A8 L-Arginine A9 L-Asparagine A10 L-Aspartic Acid A11 L-Cysteine A12 L-Glutamic Acid B1 L-Glutamine B2 Glycine B3 L-Histidine B4 L-Isoleucine B5 L-Leucine B6 L-Lysine B7 L-Methionine B8 L-Phenylalanine B9 L-Proline B10 L-Serine B11 L-Threonine B12 L-Tryptophan C1 L-Tyrosine C2 L-Valine C3 D-Alanine C4 D-Asparagine C5 D-Aspartic Acid C6 D-Glutamic Acid C7 D-Lysine C8 D-Serine C9 D-Valine C10 L-Citrulline C11 L-Homoserine C12 L-Ornithine D1 N-Acetyl-LGlutamic Acid D2 N-Phthaloyl-LGlutamic Acid D3 L-Pyroglutamic Acid D4 Hydroxylamine D5 Methylamine D6 N-Amylamine D7 N-Butylamine D8 Ethylamine D9 Ethanolamine D10 Ethylenediamine D11 Putrescine D12 Agmatine E1 Histamine E2 ß-Phenylethylamine E3 Tyramine E4 Acetamide E5 Formamide E6 Glucuronamide E7 D,L-Lactamide E8 D-Glucosamine E9 D-Galactosamine E10 D-Mannosamine E11 N-Acetyl-DGlucosamine E12 N-Acetyl-DGalactosamine F1 N-Acetyl-DMannosamine F2 Adenine F3 Adenosine F4 Cytidine F5 Cytosine F6 Guanine F7 Guanosine F8 Thymine F9 Thymidine F10 Uracil F11 Uridine F12 Inosine G1 Xanthine G2 Xanthosine G3 Uric Acid G4 Alloxan G5 Allantoin G6 Parabanic Acid G7 D,L-α-Amino-NButyric Acid G8 γ-Amino-NButyric Acid G9 ε-Amino-NCaproic Acid G10 D,L-α-AminoCaprylic Acid G11 δ-Amino-NValeric Acid G12 α-Amino-NValeric Acid H1 Ala-Asp H2 Ala-Gln H3 Ala-Glu H4 Ala-Gly H5 Ala-His H6 Ala-Leu H7 Ala-Thr H8 Gly-Asn H9 Gly-Gln H10 Gly-Glu H11 Gly-Met H12 Met-Ala"
  • Related