I have a df with 6 columns and >5000 rows. I need to group the columns based on information in a second table (samples), and then get the mean values for each group and place into a new dataframe.
Column names will not always be the same or structured as shown below: it is necessary to group based on the values in the second table.
I've searched the forums but I don't know the terminology for what I'm trying to accomplish and have come up empty handed.
Thanks for any help!
>head(df,3)
| | Control_Rep1 | Ethanol_Rep1 | Control_Rep2 | Ethanol_Rep2 | Control_Rep3 | Ethanol_Rep3 |
|--------|--------------|--------------|--------------|--------------|--------------|--------------|
| Q0120 | 22 | 29 | 25 | 39 | 13 | 23 |
| R0010W | 3694 | 6205 | 3322 | 7110 | 4985 | 10513 |
| R0020C | 3024 | 3564 | 2799 | 4191 | 5030 | 6214 |
>samples
| Identifier | Treatment |
|--------------|-----------|
| Control_Rep1 | Control |
| Ethanol_Rep1 | Ethanol |
| Control_Rep2 | Control |
| Ethanol_Rep2 | Ethanol |
| Control_Rep3 | Control |
| Ethanol_Rep3 | Ethanol |
>Desired_Table
| | Control | Ethanol |
|--------|------------|------------|
| Q0120 | 20 | 30.3333333 |
| R0010W | 4000.33333 | 7942.66667 |
| R0020C | 3617.66667 | 4656.33333 |
CodePudding user response:
We could split the 'df' based on the match
ing between the column names of 'df' and 'samples' 'Identifier' and then take the rowMeans
sapply(split.default(df, samples$Treatment[match(names(df),
samples$Identifier)]), rowMeans, na.rm = TRUE)
-output
Control Ethanol
Q0120 20.000 30.33333
R0010W 4000.333 7942.66667
R0020C 3617.667 4656.33333
data
df <- structure(list(Control_Rep1 = c(22L, 3694L, 3024L), Ethanol_Rep1 = c(29L,
6205L, 3564L), Control_Rep2 = c(25L, 3322L, 2799L), Ethanol_Rep2 = c(39L,
7110L, 4191L), Control_Rep3 = c(13L, 4985L, 5030L), Ethanol_Rep3 = c(23L,
10513L, 6214L)), class = "data.frame", row.names = c("Q0120",
"R0010W", "R0020C"))
samples <- structure(list(Identifier = c("Control_Rep1", "Ethanol_Rep1",
"Control_Rep2", "Ethanol_Rep2", "Control_Rep3", "Ethanol_Rep3"
), Treatment = c("Control", "Ethanol", "Control", "Ethanol",
"Control", "Ethanol")), class = "data.frame", row.names = c(NA,
-6L))