So I have two dataframes
main_df, about 800 rows
description | category |
---|---|
ABCD | ONE |
XYZ | THREE |
ABC | |
QWE |
keyword_df, it is about 50 rows
keyword | category |
---|---|
AB | FIVE |
What I'm trying to achieve =
main_df
description | category |
---|---|
ABCD | ONE |
XYZ | THREE |
ABC | FIVE |
QWE | 0 |
conditions = [(main_df['Description'].str.startswith('AB')) & (main_df['category').isnull()]
values = keyword_df['category'].tolist()
main_df['category'] = np.select(conditions, values)
I was able to create a list of strings
["(main_df['Description'].str.startswith('AB')) & (main_df['category').isnull()",
"(main_df['Description'].str.startswith('CD')) & (main_df['category').isnull()", ...]
But it needs to be a list of conditions and not a list of strings for it to work. Appreciate any help!
CodePudding user response:
Since you only have 50 rows in the keyword frame, you could just iterate over those and update the main frame accordingly:
import numpy as np
import pandas as pd
main_df = pd.DataFrame({'description': ['ABCD', 'XYZ', 'ABC', 'QWE'],
'category': ['ONE', 'THREE', np.nan, np.nan]})
keyword_df = pd.DataFrame({'keyword': ['AB'],
'category': ['FIVE']})
for key in keyword_df.itertuples(index=False):
mask = (main_df['description'].str.startswith(key[0])
& main_df['category'].isnull())
main_df.loc[mask, 'category'] = key[1]
main_df
description category
0 ABCD ONE
1 XYZ THREE
2 ABC FIVE
3 QWE NaN