Home > Enterprise >  most occurrence of a two column combination
most occurrence of a two column combination

Time:10-24

enter image description hereI have two columns 'Start Station' and 'End Station'

| Start Station | End Station |
|:--------------|:------------|
| A             | C           |
| A             | C           |
| B             | D           |
| C             | A           |
| A             | D           |
| C             | A           |
| C             | B           |

as you can see the most common combination is A & C I'm trying to write a code in python using pandas so that the output is "The most common combination is between A and C" I found a lot of helpful codes for this but unfortunately couldn't find a code with the output that I need. I hope I clarified my question enough and thanks in advance

I added an image because I'm new to stackoverflow and couldn't import the table example

CodePudding user response:

idxmax will return a tuple with the most frequent combination:

df.groupby(['Start Station', 'End Station']).value_counts().idxmax()

CodePudding user response:

Your question can be interpreted in many different ways (do you want all max? Does order of the route matter (A->C ≠ C->A?), etc.

You can use the first index of value_counts' output as it is sorted by default.

df.value_counts().index[0]

Output: ('A', 'C')

If you want all the max in case of tie:

df.value_counts().loc[lambda x: x==x.max()].index.tolist()

Output: [('A', 'C'), ('C', 'A')]

If the order doesn't matter:

df[['Start Station', 'End Station']].agg(set, axis=1).value_counts().index[0]

Output: {'A', 'C'}

If you have other columns:

df[['Start Station', 'End Station']].value_counts().index[0]
  • Related