I have broken a more complex problem into something simpler. The actual problem has larger lists and several more columns.
Starting with this df:
i | COL1 | COL2 | COL3 | COL4 | Revenue | QTY | Products
0 | Coin | Gold Krug | Gold Coin | Coins | 2333677473 | 21 | 12
1 | Gold Coin | Coins | Gold Coin | Coins | 2564774784 | 28 | 14
2 | Gold Coin | Coins | Gold Krug | Coins |3256666647 | 35 | 16
3 |Gold Coin | Coins | Coins |Gold Krug | 3456788 | 42 | 18
4 |Gold Krug | Gold Coin | Coins | Coins | 4588960 | 49 | 20
5 |Gold Coin | Coins | Gold Krug | Coins |346869909 |56 | 22
6 |Gold Coin | Coins | Gold Coin | Coins | 3777989 |63 | 24
7 |Gold Coin |Silver Krug |Gold Coin | Coins | 37687589 |70 | 26
8 |Gold Coin | Coins |Gold Coin | Coins | 45789889 |77 | 28
9 |Gold Coin | Gold Krug |Gold Coin |Coins | 468 |84 | 30
I would like output to be DF with a new column like this:
i | Category | Revenue | QTY |Products
0 |Gold Krug | 2333677473 |21 | 12
2 |Gold Krug | 3256666647 | 35 | 16
3 |Gold Krug | 3456788 | 42 | 18
4 | Gold Krug | 4588960 | 49 | 20
5 | Gold Krug | 346869909 | 56 | 22
7 | Silver Krug | 37687589 | 70 | 26
9 | Gold Krug | 468 | 84 | 30
I used this, but simply do not understand how to create a new column using the value in the list that matches for the new column :
KRUG = ['Gold Krug', 'Silver Krug', 'Gold Maple','Gold Eagle']
df = df[df[['COL1', 'COL2', 'COL3', 'COL4 ']].isin(KRUG).any(axis=1)]
print(df)
output :
i |COL1 |COL2 |COL3 |COL4 |Revenue |QTY |Products
0 |Coin |Gold Krug |Gold Coin |Coins |2333677473 |21 |12
2 |Gold Coin |Coins |Gold Krug |Coins |3256666647 |35 |16
3 |Gold Coin |Coins |Coins |Gold Krug |3456788 |42 |18
4 |Gold Krug |Gold Coin |Coins |Coins |4588960 |49 |20
5 |Gold Coin |Coins |Gold Krug |Coins |346869909 |56 |22
7 |Gold Coin |Silver Krug |Gold Coin |Coins |37687589 |70 |26
9 |Gold Coin |Gold Krug |Gold Coin |Coins |468 |84 |30
CodePudding user response:
Here's a method using apply() although there should be an easier way using .str. If it's not too large a database this should be ok.
import numpy as np
def get_coin(x):
for k in KRUG:
if k in x.tolist():
return k
return np.nan
df['category'] = df[['COL1', 'COL2', 'COL3', 'COL4']].apply(get_coin, axis=1)
df.drop(['COL1', 'COL2', 'COL3', 'COL4'], axis=1, inplace=True)
df.dropna(inplace=True)
i Revenue QTY Products category
0 0 2333677473 21 12 Gold Krug
2 2 3256666647 35 16 Gold Krug
3 3 3456788 42 18 Gold Krug
4 4 4588960 49 20 Gold Krug
5 5 346869909 56 22 Gold Krug
7 7 37687589 70 26 Silver Krug