Home > Software engineering >  count number of elements in a list inside a dataframe
count number of elements in a list inside a dataframe

Time:09-08

Assume that we have a dataframe and inside the dataframe in a column we have lists. How can I count the number per list? For example

A                              B
(1,2,3)                       (1,2,3,4)
(1)                           (1,2,3)

I would like to create 2 new columns with the count of each column. something like the following

A                              B              C              D         
(1,2,3)                       (1,2,3,4)       3              4
(1)                           (1,2,3)         1              3

where C corresponds to the number of the elements in the column A for that row, and D for the number of elements in the list in column B for that row

I cannot just do

df['A'] = len(df['A'])

Because that returns the len of my dataframe

CodePudding user response:

You can use the .apply method on the Series for the column df['A'].

>>> import pandas
>>> import pandas as pd
>>> pd.DataFrame({"column": [[1, 2], [1], [1, 2, 3]]})
      column
0     [1, 2]
1        [1]
2  [1, 2, 3]
>>> df = pd.DataFrame({"column": [[1, 2], [1], [1, 2, 3]]})
>>> df["column"].apply
<bound method Series.apply of 0       [1, 2]
1          [1]
2    [1, 2, 3]
Name: column, dtype: object>
>>> df["column"].apply(len)
0    2
1    1
2    3
Name: column, dtype: int64
>>> df["column"] = df["column"].apply(len)
>>> 

See Python Pandas, apply function for a more general discussion of apply.

CodePudding user response:

You can pandas' apply with the len function to each column like bellow to obtain what you are looking for

# package importation
import pandas as pd

# creating a sample dataframce
df = pd.DataFrame(
    {
        'A':[[1,2,3],[32,4],[45,67,23,54,3],[],[0]],
        'B':[[2],[3],[2,3],[5,6,1],[98,44]]
    },
    index=['z','y','m','n','o']
)

# computing lengths of lists in the column
df['items_in_A'] = df['A'].apply(len)
df['items_in_B'] = df['B'].apply(len)

# check the putput
print(df)

output

                     A          B  items_in_A  items_in_B
z            [1, 2, 3]        [2]           3           1
y              [32, 4]        [3]           2           1
m  [45, 67, 23, 54, 3]     [2, 3]           5           2
n                   []  [5, 6, 1]           0           3
o                  [0]   [98, 44]           1           2
  • Related