Home > Net >  How to use the contain function in the columns of the first df to filter with the columns of the ind
How to use the contain function in the columns of the first df to filter with the columns of the ind

Time:07-27

1.I have two df,one is main df,the other is index df.

2.The target I want to do is let the columns pdoduct of main df can use 'contain' function in index df to filter key word.

3.In the end ,the main df can have new columns keyword to show main_df[keyword]=[C2,VA,E220F,7350M]

plz help and instructor ,txs a lot.

main df is

        data    num           product
 0  2019-10-01  39013000    xxxxxC2xxxxxxx
 1  2019-10-01  39013000    xxxxxxVAxxxxxxxxxxxx
 2  2019-10-28  39013000    xxxxxxxxE220Fxxxxxxxxxxxxx
 3  2019-12-31  39013000    xxxxxxxx7350Mxxxxxxxx

index df is

    product
0   VA
1   C2
2   7350M
3   E220F

Below is my code:

for key in key_word:
    mask = df_import_tmp0["product"].str.contains(key) 
df_import_tmp0['keyword']=key

The output is not I want :(

df_import_tmp0

    data         num            product                 keyword
0   2019-10-01  39013000    xxxxxC2xxxxxxx              E220F
1   2019-10-01  39013000    xxxxxxVAxxxxxxxxxxxx        E220F
2   2019-10-28  39013000    xxxxxxxxE220Fxxxxxxxxxxxxx  E220F
3   2019-12-31  39013000    xxxxxxxx7350Mxxxxxxxx       E220F

CodePudding user response:

Here's a way to do what you're asking:

df['keyword'] = df['product'].str.extract('('   '|'.join(idx['product'].tolist())   ')')

Input:

df:

         data       num                     product
0  2019-10-01  39013000              xxxxxC2xxxxxxx
1  2019-10-01  39013000        xxxxxxVAxxxxxxxxxxxx
2  2019-10-28  39013000  xxxxxxxxE220Fxxxxxxxxxxxxx
3  2019-12-31  39013000       xxxxxxxx7350Mxxxxxxxx

idx:

  product
0      VA
1      C2
2   7350M
3   E220F

Output:

         data       num                     product keyword
0  2019-10-01  39013000              xxxxxC2xxxxxxx      C2
1  2019-10-01  39013000        xxxxxxVAxxxxxxxxxxxx      VA
2  2019-10-28  39013000  xxxxxxxxE220Fxxxxxxxxxxxxx   E220F
3  2019-12-31  39013000       xxxxxxxx7350Mxxxxxxxx   7350M

CodePudding user response:

A simple way:

df = pd.DataFrame({"product": product_list})
key_words = ['VA','C2','7350M','E220F']
df["keyword"] = ''
for kw in key_words:
    df['keyword'][df["product"].str.contains(kw)] = kw

Gives:

    product                     keyword
0   xxxxxC2xxxxxxx              C2
1   xxxxxxVAxxxxxxxxxxxx        VA
2   xxxxxxxxE220Fxxxxxxxxxxxxx  E220F
3   xxxxxxxx7350Mxxxxxxxx       7350M
  • Related