Home > Software engineering >  np.select pandas dataframe based on column of prefix and values
np.select pandas dataframe based on column of prefix and values

Time:10-10

So I have two dataframes

main_df, about 800 rows

description category
ABCD ONE
XYZ THREE
ABC
QWE

keyword_df, it is about 50 rows

keyword category
AB FIVE

What I'm trying to achieve =

main_df

description category
ABCD ONE
XYZ THREE
ABC FIVE
QWE 0

conditions = [(main_df['Description'].str.startswith('AB')) & (main_df['category').isnull()]

values = keyword_df['category'].tolist()

main_df['category'] = np.select(conditions, values)

I was able to create a list of strings

["(main_df['Description'].str.startswith('AB')) & (main_df['category').isnull()",
"(main_df['Description'].str.startswith('CD')) & (main_df['category').isnull()", ...]

But it needs to be a list of conditions and not a list of strings for it to work. Appreciate any help!

CodePudding user response:

Since you only have 50 rows in the keyword frame, you could just iterate over those and update the main frame accordingly:

import numpy as np
import pandas as pd

main_df = pd.DataFrame({'description': ['ABCD', 'XYZ', 'ABC', 'QWE'],
                        'category': ['ONE', 'THREE', np.nan, np.nan]})
keyword_df = pd.DataFrame({'keyword': ['AB'],
                           'category': ['FIVE']}) 

for key in keyword_df.itertuples(index=False):
    mask = (main_df['description'].str.startswith(key[0]) 
            & main_df['category'].isnull())
    main_df.loc[mask, 'category'] = key[1] 

main_df
    description   category
0   ABCD          ONE
1   XYZ           THREE
2   ABC           FIVE
3   QWE           NaN
  • Related