Home > Blockchain >  Creating a stack from pandas column of lists
Creating a stack from pandas column of lists

Time:08-06

I'm having an issue building a stack from the following df:

import pandas as pd

df = pd.DataFrame(
    {
        "Student": ['Daphne','Scooby','Daphne','Shaggy','Fred','Daphne'],
        "window": [['Fred'],['Daphne'], [''], ['Daphne'],['Velma','Scrappy'],['Velma','Fred']]
    }
)

which gives you:

  Student            window
0  Daphne            [Fred]
1  Scooby          [Daphne]
2  Daphne                []
3  Shaggy          [Daphne]
4    Fred  [Velma, Scrappy]
5  Daphne     [Velma, Fred]

I wanted to create a stack table that would show how many times each student has seen the other students in the window. I wanted the end result to be like this:

        Daphne  Fred  Scrappy  Velma
Daphne       0     2        0      1
Fred         0     0        1      1
Scrappy      0     0        0      0 
Velma        0     0        0      0

I was thinking I could do something like this:

dfd = pd.get_dummies(df["window"].dropna().apply(pd.Series).stack())

which gives you:

     Daphne  Fred  Scrappy  Velma
1 0       1     0        0      0
2 0       0     0        0      0
3 0       1     0        0      0
4 0       0     0        0      1
  1       0     0        1      0
5 0       0     0        0      1
  1       0     1        0      0

and that is not going to give me the results I want once I implement:

xrf_df = dfd.T.dot(dfd)

which gives you

            Daphne  Fred  Scrappy  Velma
         1       0     0        0      0
Daphne   0       2     0        0      0
Fred     0       0     2        0      1
Scrappy  0       0     0        1      1
Velma    0       0     1        1      2

I was thinking of using this to remove the empty spaces:

dfd = dfd.drop(0)
del dfd['']

but even that won't give me the right numbers in the right columns.

Could someone tell me what I'm doing wrong?

CodePudding user response:

one way of solving i this,

df.explode('window').pivot_table(index='Student', columns='window', aggfunc='size', fill_value=0)

O/P;

window      Daphne  Fred  Scrappy  Velma
Student                                 
Daphne   1       0     2        0      1
Fred     0       0     0        1      1
Scooby   0       1     0        0      0
Shaggy   0       1     0        0      0
  • Explode window column to remove list
  • perform pivot with column as window, index as student and count as aggrigate function and finally fill nan with 0 ​

CodePudding user response:

If I understand correctly you need .explode with .value_counts() and .unstack()

new = df.explode('window')\
         .groupby('Student')['window'].value_counts().unstack(-1,fill_value=0)

print(new)

window      Daphne  Fred  Scrappy  Velma
Student
Daphne   1       0     2        0      1
Fred     0       0     0        1      1
Scooby   0       1     0        0      0
Shaggy   0       1     0        0      0

CodePudding user response:

Here is a way using str.get_dummies()

df.set_index('Student')['window'].str.join('|').str.get_dummies().groupby(level=0).sum()

Output:

         Daphne  Fred  Scrappy  Velma
Student                              
Daphne        0     2        0      1
Fred          0     0        1      1
Scooby        1     0        0      0
Shaggy        1     0        0      0
  • Related