I'm working with a pandas dataframe that has several columns populated with values from the same group, similar to this:
Name | First Car | Second Car | Third Car | Fourth Car |
---|---|---|---|---|
Tom | VW | Ford | Honda | Audi |
Tim | BMW | Honda | Audi | Ford |
Sam | Audi | Honda | Honda | Audi |
Bill | Ford | Ford | null | Audi |
Mark | VW | Ford | Honda | null |
and I need to turn it into this:
Make | First Car | Second Car | Third Car | Fourth Car |
---|---|---|---|---|
VW | 2 | 0 | 0 | 0 |
Ford | 1 | 3 | 0 | 1 |
Honda | 0 | 2 | 3 | 0 |
Audi | 1 | 0 | 1 | 3 |
BMW | 1 | 0 | 0 | 0 |
It seems like this might be possible with a multi column groupby, or with crosstab, but I can't quite figure out how. I assume there are some nice tricks with pandas that will do this without resorting to looping through each column (I'm just getting started with pandas)?
Some further context in case it impacts the solution - once I have the information restructured I need to plot it as a stacked bar chart with matplotlib so I can save the visual programmatically using matplotlib's savefig() function.
CodePudding user response:
Select the columns you want and then apply .value_counts
to them, eg:
df.filter(regex=f'Car$').apply(pd.value_counts)
This'll give you:
First Car Second Car Third Car Fourth Car
Audi 1.0 NaN 1.0 3.0
BMW 1.0 NaN NaN NaN
Ford 1.0 3.0 NaN 1.0
Honda NaN 2.0 3.0 NaN
VW 2.0 NaN NaN NaN