1.I have two df,one is main df,the other is index df.
2.The target I want to do is let the columns pdoduct of main df can use 'contain' function in index df to filter key word.
3.In the end ,the main df can have new columns keyword to show main_df[keyword]=[C2,VA,E220F,7350M]
plz help and instructor ,txs a lot.
main df is
data num product
0 2019-10-01 39013000 xxxxxC2xxxxxxx
1 2019-10-01 39013000 xxxxxxVAxxxxxxxxxxxx
2 2019-10-28 39013000 xxxxxxxxE220Fxxxxxxxxxxxxx
3 2019-12-31 39013000 xxxxxxxx7350Mxxxxxxxx
index df is
product
0 VA
1 C2
2 7350M
3 E220F
Below is my code:
for key in key_word:
mask = df_import_tmp0["product"].str.contains(key)
df_import_tmp0['keyword']=key
The output is not I want :(
df_import_tmp0
data num product keyword
0 2019-10-01 39013000 xxxxxC2xxxxxxx E220F
1 2019-10-01 39013000 xxxxxxVAxxxxxxxxxxxx E220F
2 2019-10-28 39013000 xxxxxxxxE220Fxxxxxxxxxxxxx E220F
3 2019-12-31 39013000 xxxxxxxx7350Mxxxxxxxx E220F
CodePudding user response:
Here's a way to do what you're asking:
df['keyword'] = df['product'].str.extract('(' '|'.join(idx['product'].tolist()) ')')
Input:
df:
data num product
0 2019-10-01 39013000 xxxxxC2xxxxxxx
1 2019-10-01 39013000 xxxxxxVAxxxxxxxxxxxx
2 2019-10-28 39013000 xxxxxxxxE220Fxxxxxxxxxxxxx
3 2019-12-31 39013000 xxxxxxxx7350Mxxxxxxxx
idx:
product
0 VA
1 C2
2 7350M
3 E220F
Output:
data num product keyword
0 2019-10-01 39013000 xxxxxC2xxxxxxx C2
1 2019-10-01 39013000 xxxxxxVAxxxxxxxxxxxx VA
2 2019-10-28 39013000 xxxxxxxxE220Fxxxxxxxxxxxxx E220F
3 2019-12-31 39013000 xxxxxxxx7350Mxxxxxxxx 7350M
CodePudding user response:
A simple way:
df = pd.DataFrame({"product": product_list})
key_words = ['VA','C2','7350M','E220F']
df["keyword"] = ''
for kw in key_words:
df['keyword'][df["product"].str.contains(kw)] = kw
Gives:
product keyword
0 xxxxxC2xxxxxxx C2
1 xxxxxxVAxxxxxxxxxxxx VA
2 xxxxxxxxE220Fxxxxxxxxxxxxx E220F
3 xxxxxxxx7350Mxxxxxxxx 7350M