enter image description hereI have two columns 'Start Station' and 'End Station'
| Start Station | End Station |
|:--------------|:------------|
| A | C |
| A | C |
| B | D |
| C | A |
| A | D |
| C | A |
| C | B |
as you can see the most common combination is A & C I'm trying to write a code in python using pandas so that the output is "The most common combination is between A and C" I found a lot of helpful codes for this but unfortunately couldn't find a code with the output that I need. I hope I clarified my question enough and thanks in advance
I added an image because I'm new to stackoverflow and couldn't import the table example
CodePudding user response:
idxmax
will return a tuple with the most frequent combination:
df.groupby(['Start Station', 'End Station']).value_counts().idxmax()
CodePudding user response:
Your question can be interpreted in many different ways (do you want all max? Does order of the route matter (A->C ≠ C->A?), etc.
You can use the first index of value_counts
' output as it is sorted by default.
df.value_counts().index[0]
Output: ('A', 'C')
If you want all the max in case of tie:
df.value_counts().loc[lambda x: x==x.max()].index.tolist()
Output: [('A', 'C'), ('C', 'A')]
If the order doesn't matter:
df[['Start Station', 'End Station']].agg(set, axis=1).value_counts().index[0]
Output: {'A', 'C'}
If you have other columns:
df[['Start Station', 'End Station']].value_counts().index[0]