np.select pandas dataframe based on column of prefix and values-CodePudding

So I have two dataframes

main_df, about 800 rows

description	category
ABCD	ONE
XYZ	THREE
ABC
QWE

keyword_df, it is about 50 rows

keyword	category
AB	FIVE

What I'm trying to achieve =

main_df

description	category
ABCD	ONE
XYZ	THREE
ABC	FIVE
QWE	0

conditions = [(main_df['Description'].str.startswith('AB')) & (main_df['category').isnull()]

values = keyword_df['category'].tolist()

main_df['category'] = np.select(conditions, values)

I was able to create a list of strings

["(main_df['Description'].str.startswith('AB')) & (main_df['category').isnull()",
"(main_df['Description'].str.startswith('CD')) & (main_df['category').isnull()", ...]

But it needs to be a list of conditions and not a list of strings for it to work. Appreciate any help!

CodePudding user response：

Since you only have 50 rows in the keyword frame, you could just iterate over those and update the main frame accordingly:

import numpy as np
import pandas as pd

main_df = pd.DataFrame({'description': ['ABCD', 'XYZ', 'ABC', 'QWE'],
                        'category': ['ONE', 'THREE', np.nan, np.nan]})
keyword_df = pd.DataFrame({'keyword': ['AB'],
                           'category': ['FIVE']}) 

for key in keyword_df.itertuples(index=False):
    mask = (main_df['description'].str.startswith(key[0]) 
            & main_df['category'].isnull())
    main_df.loc[mask, 'category'] = key[1] 

main_df

    description   category
0   ABCD          ONE
1   XYZ           THREE
2   ABC           FIVE
3   QWE           NaN