Converting a data frame using a formula-CodePudding

I have a data frame with samples as columns and genes as rows. it looks something like this:

structure(list(Pt1_Hugo = c(8.02538, 0.677503, 185.304, 0.363531, 
6.55749, 20.3992, 3.13403, 0.0550484, 3.165665, 8.02006, 16.8827, 
2.11881, 16.9462, 77.88625, 19.10715), Pt2_Hugo = c(317.594, 
28.3782, 455.16, 2.864455, 0.472773, 18.53875, 2.836915, 60.42305, 
9.33938, 9.05646, 12.5851, 1.17207, 33.32875, 41.988, 14.0337
), Pt4_Hugo = c(5.747295, 0.4713935, 81.0082, 0.2012845, 0.610117, 
20.366, 2.151635, 0.14146595, 2.45732, 4.46221, 21.68765, 1.596825, 
30.92115, 59.4612, 31.61955), Pt5_Hugo = c(6.85957, 0.347623, 
41.41065, 0.04082075, 0.6240955, 24.40895, 9.04469, 0, 4.1394, 
10.50265, 28.5239, 1.53807, 35.0947, 51.8853, 28.4039), Pt6_Hugo = c(1.563465, 
0.20176, 136.1635, 0.417423, 0.9918185, 14.9076, 6.75243, 0, 
2.18692, 5.31772, 34.1763, 2.387955, 17.4285, 52.69105, 13.05855
), Pt7_Hugo = c(21.56585, 8.926245, 44.66935, 1.039475, 1.531155, 
17.60665, 7.52096, 0, 1.241595, 19.61445, 11.82775, 2.187845, 
44.83105, 69.1745, 31.60735), Pt8_Hugo = c(11.37055, 3.853125, 
119.0175, 3.126025, 6.753445, 24.4953, 7.44295, 0, 1.384905, 
6.94434, 12.9606, 2.281765, 18.2533, 82.0129, 24.19465), Pt9_Hugo = c(8.15681, 
2.53961, 232.675, 4.2168, 4.764565, 18.8917, 5.52544, 0.5253455, 
2.19941, 9.21153, 20.8876, 1.4368, 31.26105, 73.0901, 20.19505
), Pt10_Hugo = c(4.34675, 1.91435, 501.697, 1.489845, 26.19965, 
20.0471, 9.11698, 0.01114495, 9.373125, 12.40645, 12.09495, 2.308705, 
11.47055, 74.65995, 17.9659), Pt12_Hugo = c(6.508715, 4.79793, 
530.2375, 1.86852, 2.187715, 15.25125, 20.93695, 0.0290807, 7.161025, 
10.009705, 17.4145, 3.482905, 14.22705, 52.3915, 17.6822), Pt13_Hugo = c(7.2914, 
0.410501, 661.1375, 1.01877, 8.535705, 13.2086, 3.546865, 0.02354665, 
7.11458, 12.47765, 14.96335, 2.57357, 23.8442, 48.191, 12.84305
), Pt14_Hugo = c(5.73269, 2.004975, 46.72625, 0.210495, 4.688435, 
31.8928, 6.02104, 3.82364, 0.18812, 10.6887, 11.7102, 2.191775, 
34.0623, 59.8372, 23.20095), Pt15_Hugo = c(32.17475, 0.7548555, 
189.7185, 1.8318, 1.81222, 21.75415, 4.203245, 0.02317175, 1.09588, 
13.85, 13.2064, 0.792516, 30.9179, 68.81145, 30.41675), Pt19_Hugo = c(20.1598, 
1.2813, 77.16515, 0.6932985, 9.690095, 60.2925, 13.54455, 0, 
1.0430795, 4.09673, 11.223, 1.521045, 40.3712, 167.216, 47.86845
), Pt20_Hugo = c(15.92405, 3.91686, 110.73, 1.850075, 2.658665, 
18.25745, 3.79892, 0, 0.5187115, 9.62084, 12.20435, 1.74387, 
32.47005, 74.8112, 29.2178)), row.names = c("A1BG", "A1BG-AS1", 
"A2M", "A2M-AS1", "A4GALT", "AAAS", "AACS", "AADAC", "AADAT", 
"AAED1", "AAGAB", "AAK1", "AAMDC", "AAMP", "AAR2"), class = "data.frame")

I want to transform this dataframe, lets call it olddata, into newdata, using this formula: newdata = (x/sumX) * 10^6

x = each value in the olddata

sumX = the sum of a column (the sum of every x, in each sample).

For example, using this dummy dataframe:

         Sample1     sample7     sample10     sample4
geneA      4            100         50           78
geneB      1            10          30           90
geneC      20           0           44           11
geneD      1            3           12           75

For the first value, which is 4 (geneA,sample1) according to the formula would be:

(4/26)*10^6 = 153,846.15

And that is because the sum of Sample1, which is sumX, is equal to 26, and the value, which is x, is 4.

Another example: For 3 (geneD,sample7) would be (3/113)*10^6.

How do I do that for the whole dataframe?

CodePudding user response：

In base R you can do:

olddata[] <- lapply(olddata, \(x) x/sum(x) * 1e6)

Which gives you

olddata
#>         Sample1   sample7  sample10   sample4
#> geneA 153846.15 884955.75 367647.06 307086.61
#> geneB  38461.54  88495.58 220588.24 354330.71
#> geneC 769230.77      0.00 323529.41  43307.09
#> geneD  38461.54  26548.67  88235.29 295275.59

CodePudding user response：

We could use colSums as well

olddata/colSums(olddata)[col(olddata)] * 1e6