I have a bridging table in01
that contains relations. E.g. Customer A is related to Customer B (which is then stored in a two duplicate records A/B and B/A).
/* 1. Testdata */
data in01(keep=primary: secondary:);
infile datalines;
length primary_party_no secondary_party_no $ 15;
input primary_party_no secondary_party_no ;
datalines;
A B
B A
A C
C A
B D
D B
W Z
Z W
Y Z
Z Y
X Y
Y X
;
run;
Task: My Task is group all customers with a connection and create an ID, regardless of the number of "links" required to connect the customers.
In the example above Group 1
would consist of A, B, C, D
, while Group 2
would consist of W, X, Y, Z
.
I reckon the the data has to be looped over recursively - I am, however, unable to figure out how to design a data step or macro that calls upon itself in each loop.
What would be a good starting point to tackle this problem?
CodePudding user response:
Looks like you are trying to calculate the connected subgraphs from the network that your connections build. See this previous question. Identifying groups/networks of customers
In your case the node ID variables are character instead of numeric. The macro mentioned in that other question has been updated to handle character node ids https://github.com/sasutils/macros/blob/master/subnet.sas
It sounds like your connections are not one-way so disable the directed option.
filename src url 'https://raw.githubusercontent.com/sasutils/macros/master/subnet.sas';
%include src;
%subnet
(in=in01
,out=want
,from=primary_party_no
,to=secondary_party_no
,directed=0
);
proc print data=want;
run;
Results:
primary_ secondary_
Obs party_no party_no subnet
1 A B 1
2 A C 1
3 B A 1
4 B D 1
5 C A 1
6 D B 1
7 W Z 2
8 X Y 2
9 Y X 2
10 Y Z 2
11 Z W 2
12 Z Y 2