I have a data frame which looks like:
28 Madrid 08 Barcelona 46 Valencia/València 03 Alicante/Alacant 41 Sevilla
2020 143.4846560 128.4231590 54.4870538 37.7936793 37.4922771
2032 125.5210867 113.9580077 56.6657609 40.5967195 39.5353532
2031 123.7643426 112.9806035 56.1167784 40.0735714 39.0574155
2030 122.1152422 112.1129958 55.6022418 39.5934808 38.6074651
I am trying to column wise and row wise compute the ratios between all combinations. That is, I want to compute the ratio for 2020
between 28 Madrid
and 08 Barcelona
(which is 1.11728023
. Then do the same for 28 Madrid
and 46 Valencia/València
(which is 2.63337153
)... etc. Then move down to the second row 2032
and compute the same.
Expected output:
A data frame (or list) in long or wide format showing for each year the different ratios between the column combinations.
I would be happy if each year is a list and then the expected output would be:
2020
Barcelona Valencia Alicante Sevilla
Madrid 1.11 2.63
Barcelona x.xxx x.xxx
Valencia x.xxx x.xxx
Alicante
Sevilla
2021
Barcelona Valencia Alicante Sevilla
Madrid x.xxx x.xxx
Barcelona x.xxx x.xxx
Valencia x.xxx x.xxx
Alicante
Sevilla
and so on.
Data:
df = structure(list(`28 Madrid` = c(143.484656024558, 125.521086677107,
123.764342599024, 122.115242153178, 120.515657778159, 118.995729690994,
117.540387783171, 116.134561946176, 114.776099307365, 113.393955069462,
112.024728067427, 110.633972338945, 107.335771447251, 101.333562513803,
100.321712370789, 101.540238287695, 100.881459258414, 97.0213586064507,
92.1407963208957, 92.7113075717435, 91.2667300271439), `08 Barcelona` = c(128.423159002175,
113.958007702376, 112.980603521678, 112.11299584586, 111.277681259713,
110.556468923735, 109.912760002968, 109.322872898317, 108.74805590218,
108.179697523977, 107.591963291971, 107.012840550545, 108.358385953488,
104.229176220936, 104.227023348291, 100.831943187585, 101.294810806198,
99.6284873791932, 98.9008164252815, 96.2011141288166, 95.6026155335876
), `46 Valencia/València` = c(54.4870537649372, 56.6657608813826,
56.1167783569818, 55.6022417948964, 55.1178454498369, 54.6528249585798,
54.222250429638, 53.815357499788, 53.4149231878722, 53.0295589844693,
52.6291246725535, 52.2286903606376, 54.5171939819631, 50.6721634385131,
50.517156608094, 49.171611205151, 49.7658040550906, 48.2308058594132,
47.6064727924476, 46.5817054135662, 45.2253956473996), `03 Alicante/Alacant` = c(37.7936792778645,
40.5967194612755, 40.0735714086112, 39.5934808088411, 39.1112373364264,
38.6785099348399, 38.2522411511875, 37.8561125845611, 37.4621368905793,
37.0789255598212, 36.7000199743524, 36.3211143888836, 39.5934808088411,
34.3899876265798, 35.0272379294136, 34.247898032029, 34.12949003657,
32.6267849305632, 32.7365814354433, 32.0282863353341, 31.3049211267119
), `41 Sevilla` = c(37.4922771076053, 39.535353247434, 39.0574155203086,
38.6074651375645, 38.1919607171357, 37.7850677872857, 37.3997035838828,
37.0466324701505, 36.7021728469971, 36.3491017332648, 36.0197122186244,
35.6989341945628, 38.1123044292814, 34.9346644056911, 33.9981648052427,
34.0412222581369, 34.7452116129567, 33.027219242479, 32.5062240624595,
31.8495979058233, 31.5094440279593)), row.names = c("2020", "2032",
"2031", "2030", "2029", "2028", "2027", "2026", "2025", "2024",
"2023", "2022", "2021", "2017", "2018", "2019", "2015", "2016",
"2012", "2014", "2013"), class = "data.frame")
CodePudding user response:
We can apply
along the rows (MARGIN=1
) and use outer(., ., FUN=`/`)
.
out <- apply(df, 1, function(z) outer(z, z, FUN = `/`), simplify = FALSE)
out[1:3]
# $`2020`
# 28 Madrid 08 Barcelona 46 Valencia/Valencia 03 Alicante/Alacant 41 Sevilla
# 28 Madrid 1.0000000 1.1172802 2.6333715 3.7965252 3.827046
# 08 Barcelona 0.8950306 1.0000000 2.3569481 3.3980063 3.425323
# 46 Valencia/Valencia 0.3797413 0.4242775 1.0000000 1.4416975 1.453287
# 03 Alicante/Alacant 0.2633988 0.2942902 0.6936268 1.0000000 1.008039
# 41 Sevilla 0.2612982 0.2919433 0.6880951 0.9920251 1.000000
# $`2032`
# 28 Madrid 08 Barcelona 46 Valencia/Valencia 03 Alicante/Alacant 41 Sevilla
# 28 Madrid 1.0000000 1.1014679 2.2151134 3.0919022 3.174907
# 08 Barcelona 0.9078794 1.0000000 2.0110558 2.8070743 2.882433
# 46 Valencia/Valencia 0.4514442 0.4972512 1.0000000 1.3958212 1.433293
# 03 Alicante/Alacant 0.3234255 0.3562428 0.7164241 1.0000000 1.026846
# 41 Sevilla 0.3149698 0.3469291 0.6976939 0.9738559 1.000000
# $`2031`
# 28 Madrid 08 Barcelona 46 Valencia/Valencia 03 Alicante/Alacant 41 Sevilla
# 28 Madrid 1.0000000 1.0954477 2.2054784 3.0884281 3.168780
# 08 Barcelona 0.9128688 1.0000000 2.0133124 2.8193295 2.892680
# 46 Valencia/Valencia 0.4534164 0.4966939 1.0000000 1.4003438 1.436777
# 03 Alicante/Alacant 0.3237893 0.3546943 0.7141103 1.0000000 1.026017
# 41 Sevilla 0.3155789 0.3457002 0.6960025 0.9746427 1.000000
I generally discourage the use of apply
on a frame since it converts to a matrix
internally before processing. On one scale this might be inefficient, but the real risk is if there are non-numeric
columns in the frame: when that is the case, everything is often coerced into character
, in which case things go awry.
However, in this case it works great since (1) we want to do things one row at a time, and (2) all columns are the same class. Bonus, it preserves the row names (years) in the output as names of the list.
(I should note that simplify=
was added to apply
in R-4.1.0. In a pinch one could work around with something like:
lapply(seq_len(nrow(df)), function(i) outer(unlist(df[i,]), unlist(df[i,]), FUN = `/`))
though we now need to add the names. Less-elegant but it still works.)