Home > database >  Manipulating a column of lists in pandas
Manipulating a column of lists in pandas

Time:09-21

I have a dataframe that looks like this:

import pandas as pd
score = [[0,1,0,3],[0,2,6,4,0,0],[0,0,0],[0,4,4,2,1,0,0,0]]
group = ["A", "B", "C", "D"]
df = pd.DataFrame([group, score]).T
df.columns = ['Group', 'Score']

You will notice that the score column contains arrays of different lengths. I would like to create two new columns. The want the first new column to be the total number of zeros in the Score columns for that row. I want the second new column to be the last entry in the Score column for that row.

enter image description here

I could write a loop that iterates through every row and perform the required operations. However, I have more than 2 million entries and this would be inefficient.

CodePudding user response:

we can do a generator fed to DataFrame constructor and assign to columns:

>>> df[["Zeros", "Last Entry"]] = pd.DataFrame((sc.count(0), sc[-1])
                                               for sc in df.Score)

>>> df

  Group                     Score  Zeros  Last Entry
0     A              [0, 1, 0, 3]      2           3
1     B        [0, 2, 6, 4, 0, 0]      3           0
2     C                 [0, 0, 0]      3           0
3     D  [0, 4, 4, 2, 1, 0, 0, 0]      4           0

I have more than 2 million entries and this would be inefficient.

Well... You have Python lists in a column, so fastness of numeric operations with vectorization is out of the table unfortunately...

  • Related