I have a data frame with samples as columns and genes as rows. it looks something like this:
structure(list(Pt1_Hugo = c(8.02538, 0.677503, 185.304, 0.363531,
6.55749, 20.3992, 3.13403, 0.0550484, 3.165665, 8.02006, 16.8827,
2.11881, 16.9462, 77.88625, 19.10715), Pt2_Hugo = c(317.594,
28.3782, 455.16, 2.864455, 0.472773, 18.53875, 2.836915, 60.42305,
9.33938, 9.05646, 12.5851, 1.17207, 33.32875, 41.988, 14.0337
), Pt4_Hugo = c(5.747295, 0.4713935, 81.0082, 0.2012845, 0.610117,
20.366, 2.151635, 0.14146595, 2.45732, 4.46221, 21.68765, 1.596825,
30.92115, 59.4612, 31.61955), Pt5_Hugo = c(6.85957, 0.347623,
41.41065, 0.04082075, 0.6240955, 24.40895, 9.04469, 0, 4.1394,
10.50265, 28.5239, 1.53807, 35.0947, 51.8853, 28.4039), Pt6_Hugo = c(1.563465,
0.20176, 136.1635, 0.417423, 0.9918185, 14.9076, 6.75243, 0,
2.18692, 5.31772, 34.1763, 2.387955, 17.4285, 52.69105, 13.05855
), Pt7_Hugo = c(21.56585, 8.926245, 44.66935, 1.039475, 1.531155,
17.60665, 7.52096, 0, 1.241595, 19.61445, 11.82775, 2.187845,
44.83105, 69.1745, 31.60735), Pt8_Hugo = c(11.37055, 3.853125,
119.0175, 3.126025, 6.753445, 24.4953, 7.44295, 0, 1.384905,
6.94434, 12.9606, 2.281765, 18.2533, 82.0129, 24.19465), Pt9_Hugo = c(8.15681,
2.53961, 232.675, 4.2168, 4.764565, 18.8917, 5.52544, 0.5253455,
2.19941, 9.21153, 20.8876, 1.4368, 31.26105, 73.0901, 20.19505
), Pt10_Hugo = c(4.34675, 1.91435, 501.697, 1.489845, 26.19965,
20.0471, 9.11698, 0.01114495, 9.373125, 12.40645, 12.09495, 2.308705,
11.47055, 74.65995, 17.9659), Pt12_Hugo = c(6.508715, 4.79793,
530.2375, 1.86852, 2.187715, 15.25125, 20.93695, 0.0290807, 7.161025,
10.009705, 17.4145, 3.482905, 14.22705, 52.3915, 17.6822), Pt13_Hugo = c(7.2914,
0.410501, 661.1375, 1.01877, 8.535705, 13.2086, 3.546865, 0.02354665,
7.11458, 12.47765, 14.96335, 2.57357, 23.8442, 48.191, 12.84305
), Pt14_Hugo = c(5.73269, 2.004975, 46.72625, 0.210495, 4.688435,
31.8928, 6.02104, 3.82364, 0.18812, 10.6887, 11.7102, 2.191775,
34.0623, 59.8372, 23.20095), Pt15_Hugo = c(32.17475, 0.7548555,
189.7185, 1.8318, 1.81222, 21.75415, 4.203245, 0.02317175, 1.09588,
13.85, 13.2064, 0.792516, 30.9179, 68.81145, 30.41675), Pt19_Hugo = c(20.1598,
1.2813, 77.16515, 0.6932985, 9.690095, 60.2925, 13.54455, 0,
1.0430795, 4.09673, 11.223, 1.521045, 40.3712, 167.216, 47.86845
), Pt20_Hugo = c(15.92405, 3.91686, 110.73, 1.850075, 2.658665,
18.25745, 3.79892, 0, 0.5187115, 9.62084, 12.20435, 1.74387,
32.47005, 74.8112, 29.2178)), row.names = c("A1BG", "A1BG-AS1",
"A2M", "A2M-AS1", "A4GALT", "AAAS", "AACS", "AADAC", "AADAT",
"AAED1", "AAGAB", "AAK1", "AAMDC", "AAMP", "AAR2"), class = "data.frame")
I want to transform this dataframe, lets call it olddata
, into newdata
, using this formula: newdata = (x/sumX) * 10^6
x = each value in the olddata
sumX = the sum of a column (the sum of every x, in each sample).
For example, using this dummy dataframe:
Sample1 sample7 sample10 sample4
geneA 4 100 50 78
geneB 1 10 30 90
geneC 20 0 44 11
geneD 1 3 12 75
For the first value, which is 4 (geneA,sample1) according to the formula would be:
(4/26)*10^6 = 153,846.15
And that is because the sum of Sample1, which is sumX
, is equal to 26, and the value, which is x
, is 4.
Another example: For 3 (geneD,sample7) would be (3/113)*10^6
.
How do I do that for the whole dataframe?
CodePudding user response:
In base R you can do:
olddata[] <- lapply(olddata, \(x) x/sum(x) * 1e6)
Which gives you
olddata
#> Sample1 sample7 sample10 sample4
#> geneA 153846.15 884955.75 367647.06 307086.61
#> geneB 38461.54 88495.58 220588.24 354330.71
#> geneC 769230.77 0.00 323529.41 43307.09
#> geneD 38461.54 26548.67 88235.29 295275.59
CodePudding user response:
We could use colSums
as well
olddata/colSums(olddata)[col(olddata)] * 1e6