Home > Software engineering >  How to count id's per name in a dataframe
How to count id's per name in a dataframe

Time:03-20

I have a list of names:

lst = ['Albert', 'Carl', 'Julian', 'Mary']

and I have a DF:

target     id      name
  A       100     Albert
  A       110     Albert
  B       200     Carl
  D       500     Mary
  E       235     Mary

I want to make another dataframe counting how many id per name in lst:

lst_names   Count
Albert       2
Carl         1
Julian       0
Mary         2

What's the most efficient way to do this considering the list of names has 12k unique names on it?

CodePudding user response:

Check with value_counts

pd.Categorical(df['name'],lst).value_counts()
Out[894]: 
Albert    2
Carl      1
Julian    0
Mary      2
dtype: int64

Or

df['name'].value_counts().reindex(lst,fill_value=0)
Out[896]: 
Albert    2
Carl      1
Julian    0
Mary      2
Name: name, dtype: int64

CodePudding user response:

You can use value_counts, and then create an empty Series with lst as the index, and then add them together, filling NaN with 0:

(df['name'].value_counts()   pd.Series(index=lst, dtype=int)).fillna(0).astype(int)

Output:

>>> df
Albert    2
Carl      1
Julian    0
Mary      2
Name: count, dtype: int64
  • Related