Home > Back-end >  Creating row wise and column wise ratios of every combination
Creating row wise and column wise ratios of every combination

Time:01-24

I have a data frame which looks like:

       28 Madrid 08 Barcelona 46 Valencia/València 03 Alicante/Alacant 41 Sevilla
2020 143.4846560  128.4231590           54.4870538          37.7936793 37.4922771
2032 125.5210867  113.9580077           56.6657609          40.5967195 39.5353532
2031 123.7643426  112.9806035           56.1167784          40.0735714 39.0574155
2030 122.1152422  112.1129958           55.6022418          39.5934808 38.6074651

I am trying to column wise and row wise compute the ratios between all combinations. That is, I want to compute the ratio for 2020 between 28 Madrid and 08 Barcelona (which is 1.11728023. Then do the same for 28 Madrid and 46 Valencia/València (which is 2.63337153)... etc. Then move down to the second row 2032 and compute the same.

Expected output:

A data frame (or list) in long or wide format showing for each year the different ratios between the column combinations.

I would be happy if each year is a list and then the expected output would be:

2020
         Barcelona    Valencia    Alicante    Sevilla
Madrid     1.11         2.63
Barcelona  x.xxx        x.xxx
Valencia   x.xxx        x.xxx
Alicante
Sevilla

2021
         Barcelona    Valencia    Alicante    Sevilla
Madrid     x.xxx        x.xxx
Barcelona  x.xxx        x.xxx
Valencia   x.xxx        x.xxx
Alicante
Sevilla

and so on.

Data:

df = structure(list(`28 Madrid` = c(143.484656024558, 125.521086677107, 
123.764342599024, 122.115242153178, 120.515657778159, 118.995729690994, 
117.540387783171, 116.134561946176, 114.776099307365, 113.393955069462, 
112.024728067427, 110.633972338945, 107.335771447251, 101.333562513803, 
100.321712370789, 101.540238287695, 100.881459258414, 97.0213586064507, 
92.1407963208957, 92.7113075717435, 91.2667300271439), `08 Barcelona` = c(128.423159002175, 
113.958007702376, 112.980603521678, 112.11299584586, 111.277681259713, 
110.556468923735, 109.912760002968, 109.322872898317, 108.74805590218, 
108.179697523977, 107.591963291971, 107.012840550545, 108.358385953488, 
104.229176220936, 104.227023348291, 100.831943187585, 101.294810806198, 
99.6284873791932, 98.9008164252815, 96.2011141288166, 95.6026155335876
), `46 Valencia/València` = c(54.4870537649372, 56.6657608813826, 
56.1167783569818, 55.6022417948964, 55.1178454498369, 54.6528249585798, 
54.222250429638, 53.815357499788, 53.4149231878722, 53.0295589844693, 
52.6291246725535, 52.2286903606376, 54.5171939819631, 50.6721634385131, 
50.517156608094, 49.171611205151, 49.7658040550906, 48.2308058594132, 
47.6064727924476, 46.5817054135662, 45.2253956473996), `03 Alicante/Alacant` = c(37.7936792778645, 
40.5967194612755, 40.0735714086112, 39.5934808088411, 39.1112373364264, 
38.6785099348399, 38.2522411511875, 37.8561125845611, 37.4621368905793, 
37.0789255598212, 36.7000199743524, 36.3211143888836, 39.5934808088411, 
34.3899876265798, 35.0272379294136, 34.247898032029, 34.12949003657, 
32.6267849305632, 32.7365814354433, 32.0282863353341, 31.3049211267119
), `41 Sevilla` = c(37.4922771076053, 39.535353247434, 39.0574155203086, 
38.6074651375645, 38.1919607171357, 37.7850677872857, 37.3997035838828, 
37.0466324701505, 36.7021728469971, 36.3491017332648, 36.0197122186244, 
35.6989341945628, 38.1123044292814, 34.9346644056911, 33.9981648052427, 
34.0412222581369, 34.7452116129567, 33.027219242479, 32.5062240624595, 
31.8495979058233, 31.5094440279593)), row.names = c("2020", "2032", 
"2031", "2030", "2029", "2028", "2027", "2026", "2025", "2024", 
"2023", "2022", "2021", "2017", "2018", "2019", "2015", "2016", 
"2012", "2014", "2013"), class = "data.frame")

CodePudding user response:

We can apply along the rows (MARGIN=1) and use outer(., ., FUN=`/`).

out <- apply(df, 1, function(z) outer(z, z, FUN = `/`), simplify = FALSE)
out[1:3]
# $`2020`
#                      28 Madrid 08 Barcelona 46 Valencia/Valencia 03 Alicante/Alacant 41 Sevilla
# 28 Madrid            1.0000000    1.1172802            2.6333715           3.7965252   3.827046
# 08 Barcelona         0.8950306    1.0000000            2.3569481           3.3980063   3.425323
# 46 Valencia/Valencia 0.3797413    0.4242775            1.0000000           1.4416975   1.453287
# 03 Alicante/Alacant  0.2633988    0.2942902            0.6936268           1.0000000   1.008039
# 41 Sevilla           0.2612982    0.2919433            0.6880951           0.9920251   1.000000
# $`2032`
#                      28 Madrid 08 Barcelona 46 Valencia/Valencia 03 Alicante/Alacant 41 Sevilla
# 28 Madrid            1.0000000    1.1014679            2.2151134           3.0919022   3.174907
# 08 Barcelona         0.9078794    1.0000000            2.0110558           2.8070743   2.882433
# 46 Valencia/Valencia 0.4514442    0.4972512            1.0000000           1.3958212   1.433293
# 03 Alicante/Alacant  0.3234255    0.3562428            0.7164241           1.0000000   1.026846
# 41 Sevilla           0.3149698    0.3469291            0.6976939           0.9738559   1.000000
# $`2031`
#                      28 Madrid 08 Barcelona 46 Valencia/Valencia 03 Alicante/Alacant 41 Sevilla
# 28 Madrid            1.0000000    1.0954477            2.2054784           3.0884281   3.168780
# 08 Barcelona         0.9128688    1.0000000            2.0133124           2.8193295   2.892680
# 46 Valencia/Valencia 0.4534164    0.4966939            1.0000000           1.4003438   1.436777
# 03 Alicante/Alacant  0.3237893    0.3546943            0.7141103           1.0000000   1.026017
# 41 Sevilla           0.3155789    0.3457002            0.6960025           0.9746427   1.000000

I generally discourage the use of apply on a frame since it converts to a matrix internally before processing. On one scale this might be inefficient, but the real risk is if there are non-numeric columns in the frame: when that is the case, everything is often coerced into character, in which case things go awry.

However, in this case it works great since (1) we want to do things one row at a time, and (2) all columns are the same class. Bonus, it preserves the row names (years) in the output as names of the list.

(I should note that simplify= was added to apply in R-4.1.0. In a pinch one could work around with something like:

lapply(seq_len(nrow(df)), function(i) outer(unlist(df[i,]), unlist(df[i,]), FUN = `/`))

though we now need to add the names. Less-elegant but it still works.)

  •  Tags:  
  • r
  • Related