I have a Pandas dataframe that looks like this:
data = {
'start' : ['ABC', 'DEF', 'MNO', 'PQR', 'MNO', 'MNO', 'STU'],
'end' : ['MNO', 'FGI', 'ABC', 'STU', 'DEF', 'ABC', 'PQR']
}
df = pd.DataFrame(data)
Then, I enrich it by adding a routing
column based on the two original columns:
df['routing'] = df['start'] '<>' df['end']
df
start end routing
0 ABC MNO ABC<>MNO
1 DEF FGI DEF<>FGI
2 MNO ABC MNO<>ABC
3 PQR STU PQR<>STU
4 MNO DEF MNO<>DEF
5 MNO ABC MNO<>ABC
6 STU PQR STU<>PQR
I consider an element of the routing
column to be "similar" to another element if it's either:
a) identical to that element, or
b) has the reverse routing.
For example, ABC<>MNO
is "similar" to MNO<>ABC
.
With this in mind, how would I create a pivot table that does a count of "similar" elements?
The desired pivot table would look as follows:
routing count
ABC <> MNO 3
DEF <> FGI 1
PQR <> STU 2
MNO <> DEF 1
Thanks in advance for your assistance!
CodePudding user response:
df['routing'].str.split('<>', expand=True).apply(lambda x: '<>'.join(sorted(x)), axis=1).value_counts()
ABC<>MNO 3
PQR<>STU 2
DEF<>FGI 1
DEF<>MNO 1