Home > database >  How to group the dataframe with a list column in Python
How to group the dataframe with a list column in Python

Time:09-15

I have a data frame like this:

l1 = [1,2,3,1,2,3] 
l2 = ['A','A','A','B','B','B'] 
values = [['Ram', 'Ford', 'Honda', 'Ford'],['Ford', 'Toyota', 'Subaru'],['Ford', 'Ram'],['Volvo', 'Honda', 'Ford'],['Honda', 'Ford', 'Toyota', 'Ford'],['Ram', 'Ford']] 
d = {'ID': l1, 'Group': l2, 'Values': values} 

df = pd.DataFrame(d)

df

ID   Group  Values
1      A    [Ram, Ford, Honda, Ford]
2      A    [Ford, Toyota, Subaru]
3      A    [Ford, Ram]
1      B    [Volvo, Honda, Ford]
2      B    [Honda, Ford, Toyota, Ford]
3      B    [Ram, Ford]

I want to group the data such that for every ID, I get the count of every value in every group like this:

              ID
Group  Value   1   2   3
  A    Ram     1   0   1
  A    Ford    2   1   1
  A    Honda   1   0   0
  A    Toyota  0   1   0
  A    Subaru  0   1   0
  B    Volvo   1   0   0
  B    Honda   1   1   0
  B    Ford    1   2   1
  B    Toyota  0   1   0
  B    Ram     0   0   1

Can anyone help me with this?

CodePudding user response:

First explode the column with the lists, then groupby.size to get the number wanted and unstack to get the shape wanted.

res = (
    df.explode('Values')
      .groupby(['Group','Values','ID']).size()
      .unstack('ID',fill_value=0)
      .reset_index() # if necessary
)
print(res)
# ID Group  Values  1  2  3
# 0      A    Ford  2  1  1
# 1      A   Honda  1  0  0
# 2      A     Ram  1  0  1
# 3      A  Subaru  0  1  0
# 4      A  Toyota  0  1  0
# 5      B    Ford  1  2  1
# 6      B   Honda  1  1  0
# 7      B     Ram  0  0  1
# 8      B  Toyota  0  1  0
# 9      B   Volvo  1  0  0

CodePudding user response:

Similar to the other answer, this is effectively a pivot table.

out = (df.explode('Values')
         .pivot_table(index=['Group','Values'],
                      columns='ID', 
                      aggfunc=lambda x: len(x),
                      fill_value=0)
         .reset_index())

Output

ID Group  Values  1  2  3
0      A    Ford  2  1  1
1      A   Honda  1  0  0
2      A     Ram  1  0  1
3      A  Subaru  0  1  0
4      A  Toyota  0  1  0
5      B    Ford  1  2  1
6      B   Honda  1  1  0
7      B     Ram  0  0  1
8      B  Toyota  0  1  0
9      B   Volvo  1  0  0

CodePudding user response:

Here is a way using explode() and value_counts()

df.explode('Values').value_counts().unstack(level=0).fillna(0)

Output:

ID              1    2    3
Group Values               
A     Ford    2.0  1.0  1.0
      Honda   1.0  0.0  0.0
      Ram     1.0  0.0  1.0
      Subaru  0.0  1.0  0.0
      Toyota  0.0  1.0  0.0
B     Ford    1.0  2.0  1.0
      Honda   1.0  1.0  0.0
      Ram     0.0  0.0  1.0
      Toyota  0.0  1.0  0.0
      Volvo   1.0  0.0  0.0
  • Related