Assume that we have a dataframe and inside the dataframe in a column we have lists. How can I count the number per list? For example
A B
(1,2,3) (1,2,3,4)
(1) (1,2,3)
I would like to create 2 new columns with the count of each column. something like the following
A B C D
(1,2,3) (1,2,3,4) 3 4
(1) (1,2,3) 1 3
where C corresponds to the number of the elements in the column A for that row, and D for the number of elements in the list in column B for that row
I cannot just do
df['A'] = len(df['A'])
Because that returns the len of my dataframe
CodePudding user response:
You can use the .apply
method on the Series for the column df['A']
.
>>> import pandas
>>> import pandas as pd
>>> pd.DataFrame({"column": [[1, 2], [1], [1, 2, 3]]})
column
0 [1, 2]
1 [1]
2 [1, 2, 3]
>>> df = pd.DataFrame({"column": [[1, 2], [1], [1, 2, 3]]})
>>> df["column"].apply
<bound method Series.apply of 0 [1, 2]
1 [1]
2 [1, 2, 3]
Name: column, dtype: object>
>>> df["column"].apply(len)
0 2
1 1
2 3
Name: column, dtype: int64
>>> df["column"] = df["column"].apply(len)
>>>
See Python Pandas, apply function for a more general discussion of apply.
CodePudding user response:
You can pandas' apply with the len
function to each column like bellow to obtain what you are looking for
# package importation
import pandas as pd
# creating a sample dataframce
df = pd.DataFrame(
{
'A':[[1,2,3],[32,4],[45,67,23,54,3],[],[0]],
'B':[[2],[3],[2,3],[5,6,1],[98,44]]
},
index=['z','y','m','n','o']
)
# computing lengths of lists in the column
df['items_in_A'] = df['A'].apply(len)
df['items_in_B'] = df['B'].apply(len)
# check the putput
print(df)
output
A B items_in_A items_in_B
z [1, 2, 3] [2] 3 1
y [32, 4] [3] 2 1
m [45, 67, 23, 54, 3] [2, 3] 5 2
n [] [5, 6, 1] 0 3
o [0] [98, 44] 1 2