I have two dataframes, df1 and df2, which have a common column heading, Name. The Name values are unique within df1 and df2. df1's Name values are a subset of those in df2; df2 has more rows -- about 17,300 -- than df1 -- about 6,900 -- but each Name value in df1 is in df2. I would like to create a list of Name values in df1 that meet certain criteria in other columns of the corresponding rows in df2. Example:
df1:
Name Age Hair
0 Jim 25 black
1 Mary 58 brown
3 Sue 15 purple
df2:
Name Country phoneOS
0 Shari GB Android
1 Jim US Android
2 Alain TZ iOS
3 Sue PE iOS
4 Mary US Android
I would like a list of only those Name values in df1 that have df2 Country and phoneOS values of US and Android. The example result would be [Jim, Mary].
I have successfully selected rows within one dataframe that meet multiple criteria in order to copy those rows to a new dataframe. In that case pandas/Python does the iteration over rows internally. I guess I could write a "manual" iteration over the rows of df1 and access df2 on each iteration. I was hoping for a more efficient solution whereby the iteration was handled internally as in the single-dataframe case. But my searches for such a solution have been fruitless.
CodePudding user response:
try:
df_1.loc[df_1.Name.isin(df_2.loc[df_2.Country.eq('US') & \
df_2.phoneOS.eq('Android'), 'Name']), 'Name']
Result:
0 Jim
1 Mary
Name: Name, dtype: object
if you want the result as a list just add .to_list()
at the end
CodePudding user response:
data = df1.merge(df2, on='Name')
data.loc[((data.phoneOS == 'Android') & (data.Country == "US")), 'Name'].values.tolist()