I have a data frame like this:
l1 = [1,2,3,1,2,3]
l2 = ['A','A','A','B','B','B']
values = [['Ram', 'Ford', 'Honda', 'Ford'],['Ford', 'Toyota', 'Subaru'],['Ford', 'Ram'],['Volvo', 'Honda', 'Ford'],['Honda', 'Ford', 'Toyota', 'Ford'],['Ram', 'Ford']]
d = {'ID': l1, 'Group': l2, 'Values': values}
df = pd.DataFrame(d)
df
ID Group Values
1 A [Ram, Ford, Honda, Ford]
2 A [Ford, Toyota, Subaru]
3 A [Ford, Ram]
1 B [Volvo, Honda, Ford]
2 B [Honda, Ford, Toyota, Ford]
3 B [Ram, Ford]
I want to group the data such that for every ID, I get the count of every value in every group like this:
ID
Group Value 1 2 3
A Ram 1 0 1
A Ford 2 1 1
A Honda 1 0 0
A Toyota 0 1 0
A Subaru 0 1 0
B Volvo 1 0 0
B Honda 1 1 0
B Ford 1 2 1
B Toyota 0 1 0
B Ram 0 0 1
Can anyone help me with this?
CodePudding user response:
First explode
the column with the lists, then groupby.size
to get the number wanted and unstack
to get the shape wanted.
res = (
df.explode('Values')
.groupby(['Group','Values','ID']).size()
.unstack('ID',fill_value=0)
.reset_index() # if necessary
)
print(res)
# ID Group Values 1 2 3
# 0 A Ford 2 1 1
# 1 A Honda 1 0 0
# 2 A Ram 1 0 1
# 3 A Subaru 0 1 0
# 4 A Toyota 0 1 0
# 5 B Ford 1 2 1
# 6 B Honda 1 1 0
# 7 B Ram 0 0 1
# 8 B Toyota 0 1 0
# 9 B Volvo 1 0 0
CodePudding user response:
Similar to the other answer, this is effectively a pivot table.
out = (df.explode('Values')
.pivot_table(index=['Group','Values'],
columns='ID',
aggfunc=lambda x: len(x),
fill_value=0)
.reset_index())
Output
ID Group Values 1 2 3
0 A Ford 2 1 1
1 A Honda 1 0 0
2 A Ram 1 0 1
3 A Subaru 0 1 0
4 A Toyota 0 1 0
5 B Ford 1 2 1
6 B Honda 1 1 0
7 B Ram 0 0 1
8 B Toyota 0 1 0
9 B Volvo 1 0 0
CodePudding user response:
Here is a way using explode()
and value_counts()
df.explode('Values').value_counts().unstack(level=0).fillna(0)
Output:
ID 1 2 3
Group Values
A Ford 2.0 1.0 1.0
Honda 1.0 0.0 0.0
Ram 1.0 0.0 1.0
Subaru 0.0 1.0 0.0
Toyota 0.0 1.0 0.0
B Ford 1.0 2.0 1.0
Honda 1.0 1.0 0.0
Ram 0.0 0.0 1.0
Toyota 0.0 1.0 0.0
Volvo 1.0 0.0 0.0