Home > Software engineering >  How to count combinations contained in a Pandas dataframe column
How to count combinations contained in a Pandas dataframe column

Time:10-22

I have a Pandas dataframe that looks like this:

data = {
        'start' : ['ABC', 'DEF', 'MNO', 'PQR', 'MNO', 'MNO', 'STU'],
        'end'   : ['MNO', 'FGI', 'ABC', 'STU', 'DEF', 'ABC', 'PQR']
       }

df = pd.DataFrame(data)

Then, I enrich it by adding a routing column based on the two original columns:

df['routing'] = df['start']   '<>'   df['end']
df

    start    end    routing
0   ABC      MNO    ABC<>MNO
1   DEF      FGI    DEF<>FGI
2   MNO      ABC    MNO<>ABC
3   PQR      STU    PQR<>STU
4   MNO      DEF    MNO<>DEF
5   MNO      ABC    MNO<>ABC
6   STU      PQR    STU<>PQR

I consider an element of the routing column to be "similar" to another element if it's either:

a) identical to that element, or

b) has the reverse routing.

For example, ABC<>MNO is "similar" to MNO<>ABC.

With this in mind, how would I create a pivot table that does a count of "similar" elements?

The desired pivot table would look as follows:

routing        count 
ABC <> MNO     3
DEF <> FGI     1
PQR <> STU     2
MNO <> DEF     1

Thanks in advance for your assistance!

CodePudding user response:

df['routing'].str.split('<>', expand=True).apply(lambda x: '<>'.join(sorted(x)), axis=1).value_counts()

ABC<>MNO    3
PQR<>STU    2
DEF<>FGI    1
DEF<>MNO    1
  • Related