Home > Net >  Returning Value Frequency from Multiple Columns in Pandas Dataframe, python
Returning Value Frequency from Multiple Columns in Pandas Dataframe, python

Time:03-29

I'm working with a pandas dataframe that has several columns populated with values from the same group, similar to this:

Name First Car Second Car Third Car Fourth Car
Tom VW Ford Honda Audi
Tim BMW Honda Audi Ford
Sam Audi Honda Honda Audi
Bill Ford Ford null Audi
Mark VW Ford Honda null

and I need to turn it into this:

Make First Car Second Car Third Car Fourth Car
VW 2 0 0 0
Ford 1 3 0 1
Honda 0 2 3 0
Audi 1 0 1 3
BMW 1 0 0 0

It seems like this might be possible with a multi column groupby, or with crosstab, but I can't quite figure out how. I assume there are some nice tricks with pandas that will do this without resorting to looping through each column (I'm just getting started with pandas)?

Some further context in case it impacts the solution - once I have the information restructured I need to plot it as a stacked bar chart with matplotlib so I can save the visual programmatically using matplotlib's savefig() function.

CodePudding user response:

Select the columns you want and then apply .value_counts to them, eg:

df.filter(regex=f'Car$').apply(pd.value_counts)

This'll give you:

       First Car  Second Car  Third Car  Fourth Car
Audi         1.0         NaN        1.0         3.0
BMW          1.0         NaN        NaN         NaN
Ford         1.0         3.0        NaN         1.0
Honda        NaN         2.0        3.0         NaN
VW           2.0         NaN        NaN         NaN
  • Related