I have the following list
x = [1,2,3]
And the following df
Sample df
pd.DataFrame({'UserId':[1,1,1,2,2,2,3,3,3,4,4,4],'Origins':[1,2,3,2,2,3,7,8,9,10,11,12]})
Lets say I want to return, the userid who contains any of the values in the list, in his groupby origins list.
Wanted result
pd.Series({'UserId':[1,2]})
What would be the best approach? To do this, maybe a groupby with a lambda, but I am having a little trouble formulating the condition.
CodePudding user response:
df['UserId'][df['Origins'].isin(x)].drop_duplicates()
I had considered using unique()
, but that returns a numpy array. Since you wanted a series, I went with drop_duplicates()
.
CodePudding user response:
IIUC, OP wants, for each Origin
, the UserId
whose number appears in list x
. If that is the case, the following, using pandas.Series.isin
and pandas.unique
will do the work
df_new = df[df['Origins'].isin(x)]['UserId'].unique()
[Out]:
[1 2]
Assuming one wants a series, one can convert the dataframe to a series as follows
df_new = pd.Series(df_new)
[Out]:
0 1
1 2
dtype: int64
If one wants to return a Series, and do it all in one step, instead of pandas.unique
, one can use pandas.DataFrame.drop_duplicates
(see Steven Rumbaliski answer).