My spreadsheet contains data from hospital patients, based on different criteria, and one of them are household's coordinates. The question is which household is the nearest neighbour to the household that contains the most people, based on straight-line distances and return the hserial value for the household. I have the households identified by their serial number:
hserial | hhcoords | persnum | ||
---|---|---|---|---|
101051 | 346350 | 1 | ||
101151 | 347312 | 1 | ||
101201 | 433616 | 1 | ||
101271 | 609464 | 1 | ||
101271 | 609464 | 2 | ||
101351 | 228562 | 1 | ||
101351 | 228562 | 2 | ||
101351 | 228562 | 3 | ||
101371 | 556408 | 1 | ||
101371 | 556408 | 2 |
hhcoords: (x,y) coordinates of the household location. First three digits are x, last three digits are y. persnum: number of persons within the household hserial: is the serial number of the household
Once you have their coords you can calculate the straightline distance between all houses, using Pythagorean Theorem.
For example, if we have the coords for a house 1 (x1,y1), and we want to know how far away it is from house 2 (x2,y2) then we can calculate a straight line distance using:
√((x1 – x2)2 (y1 – y2)2) The nearest neighbour is the one with the shortest distance (but careful with zeros).
I can't alter or change any of the data, even if I identify an error.
I think using the MAXIF formula would be useful as well as filtering it with FILTER, but I don't know how to aplly it to this case.
EDIT: This is my code but I have the #VALUE error:
=LET(hserial,UNIQUE(Table2[[#Headers],[hserial]]),hhcoords,UNIQUE(Table2[[#Headers],[hhcoords]]),hserial1,hserial,hserial2,TRANSPOSE(hserial),getXλ,LAMBDA(h,LEFT(INDEX(hhcoords,MATCH(h,hserial,0)),3)),getYλ,LAMBDA(h,MID(INDEX(hhcoords,MATCH(h,hserial,0)),4,3)),m,SQRT((getXλ(hserial1)-getXλ(hserial2))^2 (getYλ(hserial1)-getYλ(hserial2))^2),posMaxValue,MATCH(MAX(m),TOCOL(m),0),firstHH, CHOOSEROWS(hserial1,INT(posMaxValue/COUNT(hserial1) 1)),secondHH,INDEX(hserial1,MOD(posMaxValue,COUNT(hserial1))),HSTACK(firstHH,secondHH,MAX(m)))
CodePudding user response:
You can use this formula:
=LET(
hserial,UNIQUE(data[hserial]),
hhcoords,UNIQUE(data[hhcoords]),
hserial1,hserial,
hserial2,TRANSPOSE(hserial),
getXλ,LAMBDA(h,LEFT(INDEX(hhcoords,MATCH(h,hserial,0)),3)),
getYλ,LAMBDA(h,MID(INDEX(hhcoords,MATCH(h,hserial,0)),4,3)),
m,SQRT((getXλ(hserial1)-getXλ(hserial2))^2 (getYλ(hserial1)-getYλ(hserial2))^2),
posMaxValue,MATCH(MAX(m),TOCOL(m),0),
firstHH, CHOOSEROWS(hserial1,INT(posMaxValue/COUNT(hserial1) 1)),
secondHH,INDEX(hserial1,MOD(posMaxValue,COUNT(hserial1))),
HSTACK(firstHH,secondHH,MAX(m)))
Maybe you don't have HSTACK
- then you would have to use three formulas.
CodePudding user response:
Assuming the DATA
is located at [B2:D12]
.
The serial and coordinates of the household with the most people are: 101351
and 228562
.
-. To obtain the shortest distance enter this formula in [I7]
(Excel 2021):
= LET(
lSerial, $B$3:$B$12, lCoords, $C$3:$C$12, lPers, $D$3:$D$12,
lMaxPpl, XLOOKUP( MAX(lPers), lPers, lCoords ),
MIN( IF( lCoords = lMaxPpl, ""N/A"",
SQRT( ( LEFT(lMaxPpl,3) -LEFT(lCoords,3) )^2 ( RIGHT(lMaxPpl,3) -RIGHT(lCoords,3) )^2 ) ) ) )
-. To obtain the serial and coordinates enter this formula in [G7]
(Excel 2021):
= LET( lData, $B$3:$D$12,
lSerial, INDEX( lData, 0, 1 ), lCoords, INDEX( lData, 0, 2 ), lPers, INDEX( lData, 0, 3 ),
lMaxPpl, XLOOKUP( MAX(lPers), lPers, lCoords ),
lDist, IF( lCoords = lMaxPpl, ""N/A"",
SQRT( ( LEFT(lMaxPpl,3) -LEFT(lCoords,3) )^2 ( RIGHT(lMaxPpl,3) -RIGHT(lCoords,3) )^2 ) ),
lRow, MATCH( MIN(lDist), lDist, 0 ),
INDEX( lData, lRow, 1 ): INDEX( lData, lRow, 2 ) )
-. To obtain the serial, coordinates & distance enter this formula in [G8]
(Excel 365):
= LET( lData, $B$3:$D$12,
lSerial, INDEX( lData, 0, 1 ), lCoords, INDEX( lData, 0, 2 ), lPers, INDEX( lData, 0, 3 ),
lMaxPpl, XLOOKUP( MAX(lPers), lPers, lCoords ),
lDist, IF( lCoords = lMaxPpl, ""N/A"",
SQRT( ( LEFT(lMaxPpl,3) -LEFT(lCoords,3) )^2 ( RIGHT(lMaxPpl,3) -RIGHT(lCoords,3) )^2 ) ),
CHOOSEROWS( HSTACK( DROP( lData, , -1 ), lDist ),
MATCH( MIN(lDist), lDist, 0 ) ) )
Evaluation of the LET function:
lData, $B$3:$D$12
: Sets the range to work with as lData
, having the entire data range as one name
makes easier the maintenance of the formula, i.e. if the range size changes only this name needs to be adjusted. However, I would suggest the use of Excel Tables
as formulas are automatically updated when the size of the table changes.
lSerial, INDEX( lData, 0, 1 )
, lCoords, INDEX( lData, 0, 2 )
and lPers, INDEX( lData, 0, 3 )
the columns to be used in the calculations.
lMaxPpl, XLOOKUP( MAX(lPers), lPers, lCoords )
: Gives the coordinates of the household with most people.
lDist, IF( lCoords = lMaxPpl, ""N/A"", SQRT( ( LEFT(lMaxPpl,3) -LEFT(lCoords,3) )^2 ( RIGHT(lMaxPpl,3) -RIGHT(lCoords,3) )^2 ) )
: Calculates the distances of all other households from the one with most people.
lRow, MATCH( MIN(lDist), lDist, 0 )
: Gives the row of the shortest distance.
INDEX( lData, lRow, 1 ): INDEX( lData, lRow, 2 )
: Use to return the first two columns from the data (not sure if HSTACK
or any other alternative is available in Excel 2021).
DROP( lData, , -1 )
: Excludes the last column from the data range.
HSTACK( DROP( lData, , -1 ), lDist )
: Combines the adjusted data range and the calculated distances as one array.
CHOOSEROWS( HSTACK( DROP( lData, , -1 ), lDist ), MATCH( MIN(lDist), lDist, 0 ) )
: Returns the entire row with the shortest distance from the combined array.
For documentation on the functions used see the following pages: CHOOSECOLS function, DROP function, HSTACK function, SQRT function, XLOOKUP function.