I have a list of names:
lst = ['Albert', 'Carl', 'Julian', 'Mary']
and I have a DF:
target id name
A 100 Albert
A 110 Albert
B 200 Carl
D 500 Mary
E 235 Mary
I want to make another dataframe counting how many id
per name in lst
:
lst_names Count
Albert 2
Carl 1
Julian 0
Mary 2
What's the most efficient way to do this considering the list of names has 12k unique names on it?
CodePudding user response:
Check with value_counts
pd.Categorical(df['name'],lst).value_counts()
Out[894]:
Albert 2
Carl 1
Julian 0
Mary 2
dtype: int64
Or
df['name'].value_counts().reindex(lst,fill_value=0)
Out[896]:
Albert 2
Carl 1
Julian 0
Mary 2
Name: name, dtype: int64
CodePudding user response:
You can use value_counts
, and then create an empty Series with lst
as the index, and then add them together, filling NaN with 0:
(df['name'].value_counts() pd.Series(index=lst, dtype=int)).fillna(0).astype(int)
Output:
>>> df
Albert 2
Carl 1
Julian 0
Mary 2
Name: count, dtype: int64