Home > Back-end >  How to Create a Single Column off multiple columns in Pandas using .isin() and a list?
How to Create a Single Column off multiple columns in Pandas using .isin() and a list?

Time:11-14

I have broken a more complex problem into something simpler. The actual problem has larger lists and several more columns.

Starting with this df:

 i |     COL1   |      COL2   |    COL3     |    COL4  |  Revenue    |  QTY    | Products
 
0  |      Coin  |   Gold Krug | Gold Coin   |  Coins   | 2333677473  |   21    |      12

1  | Gold Coin  |     Coins   | Gold Coin   |  Coins   | 2564774784  |   28    |    14

2  | Gold Coin  |     Coins   | Gold Krug   | Coins    |3256666647   |   35    |     16

3  |Gold Coin   |    Coins    |  Coins      |Gold Krug |    3456788  |   42    |     18

4  |Gold Krug   | Gold Coin   |  Coins      | Coins    |  4588960    | 49      |   20

5  |Gold Coin   |    Coins    | Gold Krug   | Coins    |346869909    |56       | 22

6  |Gold Coin   |    Coins    | Gold Coin   |  Coins   | 3777989     |63       | 24

7  |Gold Coin   |Silver Krug  |Gold Coin    | Coins    | 37687589    |70       | 26

8  |Gold Coin   |    Coins    |Gold Coin    | Coins    | 45789889    |77       | 28

9  |Gold Coin   | Gold Krug   |Gold Coin    |Coins     |    468      |84       | 30

I would like output to be DF with a new column like this:

i |  Category    |    Revenue         | QTY   |Products 

0 |Gold Krug     |  2333677473        |21     |    12

2 |Gold Krug     |  3256666647        | 35    |     16

3 |Gold Krug     |     3456788        | 42    |     18

4 | Gold Krug    |      4588960       |  49   |      20

5 | Gold Krug    |    346869909       |  56   |      22

7 | Silver Krug  |     37687589       |  70   |      26

9 | Gold Krug    |          468       |  84   |      30

I used this, but simply do not understand how to create a new column using the value in the list that matches for the new column :

KRUG = ['Gold Krug', 'Silver Krug', 'Gold Maple','Gold Eagle']

df = df[df[['COL1', 'COL2', 'COL3', 'COL4 ']].isin(KRUG).any(axis=1)]

print(df)

output :
i   |COL1         |COL2          |COL3          |COL4       |Revenue    |QTY    |Products
 
0   |Coin         |Gold Krug     |Gold Coin     |Coins      |2333677473 |21     |12

2   |Gold Coin    |Coins         |Gold Krug     |Coins      |3256666647 |35     |16

3   |Gold Coin    |Coins         |Coins         |Gold Krug  |3456788    |42     |18

4   |Gold Krug    |Gold Coin     |Coins         |Coins      |4588960    |49     |20

5   |Gold Coin    |Coins         |Gold Krug     |Coins      |346869909  |56     |22

7   |Gold Coin    |Silver Krug   |Gold Coin     |Coins      |37687589   |70     |26

9   |Gold Coin    |Gold Krug     |Gold Coin     |Coins      |468        |84     |30

CodePudding user response:

Here's a method using apply() although there should be an easier way using .str. If it's not too large a database this should be ok.

import numpy as np
def get_coin(x):
    for k in KRUG:
        if k in x.tolist():
            return k
    return np.nan

df['category'] = df[['COL1', 'COL2', 'COL3', 'COL4']].apply(get_coin, axis=1)
df.drop(['COL1', 'COL2', 'COL3', 'COL4'], axis=1, inplace=True)
df.dropna(inplace=True)

   i     Revenue  QTY  Products     category
0  0  2333677473   21        12    Gold Krug
2  2  3256666647   35        16    Gold Krug
3  3     3456788   42        18    Gold Krug
4  4     4588960   49        20    Gold Krug
5  5   346869909   56        22    Gold Krug
7  7    37687589   70        26  Silver Krug
  • Related