Home > Software engineering >  Closeness normalisation in igraph doesn't seem to work
Closeness normalisation in igraph doesn't seem to work

Time:05-25

I built an igraph graph from one data frame containing the (symbolic) edge list and weight. This is the data frame:

>dput(y)
structure(list(from = c("United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"Togo", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "Brunei", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "Bangladesh", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "Tunisia", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "Senegal", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "Gambia", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom", 
"United Kingdom", "United Kingdom", "United Kingdom", "United Kingdom"
), to = c("Antigua", "Argentina", "Australia", "Austria", "Bahamas", 
"Bahrain", "Bangladesh", "Barbados", "Belgium", "Bermuda", "Bolivia", 
"Botswana", "Brazil", "British Virgin", "Bulgaria", "Burkina Faso", 
"Canada", "Cayman Islands", "Chile", "China", "Colombia", "Costa Rica", 
"Croatia", "Cuba", "Cyprus", "Czech Republic", "Dem Rep Congo", 
"Denmark", "Dominican Rep", "Ecuador", "Egypt", "Estonia", "Finland", 
"France", "Gabon", "Georgia", "Germany", "Ghana", "Gibraltar", 
"Greece", "Greenland", "Guernsey", "Guinea", "Guyana", "Honduras", 
"Hong Kong", "Hungary", "Iceland", "India", "Indonesia", "Iran", 
"Ireland-Rep", "Isle of Man", "Israel", "Italy", "Ivory Coast", 
"Jamaica", "Japan", "Jersey", "Jordan", "Kazakhstan", "Kenya", 
"Kyrgyzstan", "Lebanon", "Libya", "Liechtenstein", "Lithuania", 
"Luxembourg", "Madagascar", "Madagascar", "Malaysia", "Malta", 
"Mauritania", "Mauritius", "Mexico", "Monaco", "Montenegro", 
"Morocco", "Mozambique", "Myanmar(Burma)", "Namibia", "Nepal", 
"Neth Antilles", "Netherlands", "New Zealand", "Nigeria", "Norway", 
"Oman", "Pakistan", "Panama", "Paraguay", "Peru", "Philippines", 
"Poland", "Portugal", "Puerto Rico", "Rep of Congo", "Rep of Congo", 
"Romania", "Russian Fed", "Rwanda", "Saudi Arabia", "Serbia & Mont.", 
"Serbia & Mont.", "Sierra Leone", "Singapore", "Slovak Rep", 
"Slovenia", "South Africa", "South Korea", "Spain", "Sri Lanka", 
"Sweden", "Switzerland", "Taiwan", "Tajikistan", "Tanzania", 
"Thailand", "Turkey", "Uganda", "Ukraine", "United Kingdom", 
"United States", "Unknown", "US Virgin Is", "Utd Arab Em", "Uzbekistan", 
"Venezuela", "Vietnam", "Zambia", "Zimbabwe"), weight = c(0.00652158317953266, 
0.000647329216751068, 0.0000251029566387844, 0.000214174129564211, 
0.0456767003151692, 0.00508385824169679, 0.00186393289841566, 
0.158755357993332, 0.000182399538893966, 0.0000415260352876621, 
0.00594332445796881, 0.01093302429318, 0.000114591772539915, 
0.00429007790781481, 0.00284147415679254, 0.0500675912481851, 
0.0000287088339723346, 0.00263403275683136, 0.000448611949766228, 
0.000679452144147131, 0.000252040964722078, 0.0136804520021342, 
0.00654146306362881, 0.526315789473684, 0.00191543727517555, 
0.00017092079991618, 0.00132017906908893, 0.0000627870348540249, 
0.240153698366955, 0.0132308384382318, 0.000733717703580983, 
0.0114161767224157, 0.0001650156302805, 0.000012155463860949, 
0.0154993102806925, 0.00647282707195195, 0.00000360412192179335, 
0.152230172020094, 0.0041524790299809, 0.000592713769629939, 
0.242130750605327, 0.00135643063417201, 0.5, 0.0434782608695652, 
0.00117508813160987, 0.000221503566207416, 0.0011185457116076, 
0.000215847012817643, 0.0000670498565971192, 0.000454026832077722, 
0.305530094714329, 0.0000503965198179275, 0.000317324724102019, 
0.00273860057510612, 0.0000367428222896657, 0.194287934719254, 
0.0724270297675092, 0.000171929928925887, 0.00109404761514031, 
0.0500025001250063, 0.0027947871629836, 0.056695770495521, 0.175469380593087, 
0.0431034482758621, 1.96078431372549, 0.111831804965332, 0.00155982636012959, 
0.000119064940161533, 0.0171291538198013, 1.5625, 0.00732745671304947, 
0.00243336237145763, 0.00729394602479942, 0.023089355806973, 
0.000311509885298945, 0.00462855820411942, 0.150715900527506, 
0.0199992000319987, 0.137703112090333, 0.00384711562506011, 0.0333355557037136, 
0.0842815002107038, 0.0445811600017832, 0.0000184857050227916, 
0.000437414895464401, 0.0146017376067752, 0.000147070437768394, 
0.135080372821829, 0.0272420180887, 0.000557344010558325, 0.0625, 
0.000839938046169714, 0.00254634993468612, 0.000289772340360796, 
0.000306490675634175, 0.0333333333333333, 0.0930232558139535, 
0.0357142857142857, 0.00229049421993784, 0.00024170323435183, 
0.198609731876862, 0.0213269636801809, 0.046189376443418, 0.0176056338028169, 
0.035297024460838, 0.000462550522080774, 0.0252748641476052, 
0.00447631581303324, 0.000064428161729891, 0.000223055060249402, 
0.0000167409403597136, 0.0205846027171676, 0.0000149409029764804, 
0.0000902779740069852, 0.00052983585155483, 0.228571428571429, 
0.155787505842031, 0.00130985033650055, 0.0000850992563686581, 
0.0333333333333333, 0.0053616715547239, 0.085397096498719, 0.00000198776814942642, 
0.0568181818181818, 0.914076782449726, 0.00308342198171531, 0.338983050847458, 
0.00303167187608951, 0.00502777847608034, 0.00731743011854237, 
0.075993616536211)), row.names = c(20L, 51L, 113L, 142L, 158L, 
167L, 176L, 183L, 218L, 239L, 250L, 266L, 304L, 320L, 361L, 367L, 
436L, 454L, 478L, 524L, 548L, 565L, 581L, 585L, 595L, 626L, 631L, 
661L, 669L, 682L, 704L, 737L, 773L, 820L, 826L, 837L, 888L, 899L, 
906L, 926L, 929L, 948L, 953L, 957L, 964L, 1003L, 1035L, 1039L, 
1077L, 1103L, 1110L, 1134L, 1144L, 1164L, 1206L, 1212L, 1221L, 
1252L, 1263L, 1273L, 1294L, 1304L, 1317L, 1336L, 1341L, 1342L, 
1360L, 1382L, 1398L, 1400L, 1436L, 1449L, 1458L, 1466L, 1498L, 
1516L, 1527L, 1536L, 1546L, 1552L, 1561L, 1562L, 1569L, 1618L, 
1642L, 1655L, 1699L, 1703L, 1715L, 1728L, 1743L, 1765L, 1784L, 
1817L, 1839L, 1846L, 1856L, 1857L, 1892L, 1934L, 1938L, 1946L, 
1977L, 1981L, 1986L, 2024L, 2046L, 2062L, 2097L, 2125L, 2168L, 
2180L, 2229L, 2269L, 2291L, 2294L, 2302L, 2329L, 2377L, 2388L, 
2416L, 2438L, 2557L, 2562L, 2578L, 2593L, 2605L, 2630L, 2647L, 
2676L, 2684L), class = "data.frame")

I used the following code

g <- graph_from_data_frame(y, directed=TRUE, vertices=unique(c(y$from,y$to)))

closeness_score=as.data.frame(closeness(g, mode="out",normalized = T))

to calculate the closeness centrality of resulting network (mode "out" measures paths from a vertex). The resulting value of closeness for all the countries is

United Kingdom                                  13.173718
Togo                                            19.973000
Brunei                                          58.380000
Bangladesh                                      11.865000
Tunisia                                         10.750000
Senegal                                         21.650000
Gambia                                           6.222462

and NaN for all the other vertices. I can't explain this result, because if I do not perform the normalization, i.e.

closeness_score=as.data.frame(closeness(g, mode="out",normalized = F))

I obtain

United Kingdom                                 0.10455332
Togo                                          19.97300000
Brunei                                        58.38000000
Bangladesh                                    11.86500000
Tunisia                                       10.75000000
Senegal                                       21.65000000
Gambia                                         0.04899576

As we can read in the igraph R manual, "normalization is performed by multiplying the raw closeness by n-1, where n is the number of vertices in the graph". Then, why the closeness score for Togo, Brunei, Bangladesh, Tunisia, and Senegal do not change?

CodePudding user response:

The manual is a bit outdated. I will update it for release 1.3.2. When in doubt, check the documentation of the igraph C library, and if you see any inconsistency with the R documentation, please report the problem.


Your graph is not (strongly) connected, and closeness doesn't make much sense for such graphs. Generally, the closeness of vertex v is defined as the inverse of the mean distance from v to all other vertices. But what if some other vertices are not reachable from v? R/igraph 1.3 will only consider distances to reachable vertices. Correspondingly, normalization is done by the number of reachable vertices, not n-1. If no vertices are reachable, it returns NaN.

Briefly:

  • With normalized=T, it computes the inverse of the mean distance to all vertices reachable from v.
  • With normalized=F, it computes the inverse of the sum of distances to all vertices reachable from v.

Note that the behaviour for disconnected graphs has changed since 1.2, see the changelog. Version 1.2 considered the distance to unreachable vertices to be n, which was completely arbitrary and not mathematically well-founded.


Consider whether:

  • You meant to consider edge directions in this graph
  • Harmonic centrality is a better fit for your application
  • Related