Home > Back-end >  How to make this iteration loop between all columns?
How to make this iteration loop between all columns?

Time:03-22

I have a dataframe called df which has the following columns header of data:

bureau0 bureau1 bureau2 bureau3 bureau4 waysConstitution
4753900 4773300 NaN 4713002 Three 2135
4721103 NaN NaN NaN 4751201 2135
6204000 6201501 6319400 NaN 6201501 2062
7911200 8230001 NaN 8291100 NaN 2135
4713002 NaN NaN 4751201 4753900 2054

In fact, the bureau column goes up to 100, but I think these 5 cols examples already help.

I also have another dataset, called top_bureau, which contains some bureau code. Like this:

top_bureau
7911200
4713002
4773300
5811500
6201501

A more lines in the top_bureau, but I think for example it should work.

What I would like to do is: Create a new column in df that returns the value "potential_costumer" if in any bureau column there is some code that has in top_bureau and the waysConstitution column has the value equal to "2135" or "2062".

Expected output:

bureau0 bureau1 bureau2 bureau3 bureau4 waysConstitution potential_customer
4753900 4773300 NaN 4713002 Three 2135 Yes
4721103 NaN NaN NaN 4751201 2135 Yes
6204000 6201501 6319400 NaN 6201501 2062 Yes
7911200 8230001 NaN 8291100 NaN 2135 Yes
4713002 NaN NaN 4751201 4753900 2054 No

Reasoning example: As in line 1, in the bureau3 column the value "4713002" is in the top_bureau table and the "waysConstitution" column is equal to 2135 or 2062 so this customer can be flagged as a potential customer

How to make this iteration loop between all columns?

CodePudding user response:

IIUC, you can perform simple boolean indexing.

I am assuming df the first dataframe and df2 the second one.

# is any row value in top_bureau?
m1 = df.filter(like='bureau').isin(df2['top_bureau'].unique()).any(1)
# is waysConstitution 2135 or 2062?
m2 = df['waysConstitution'].isin([2135, 2062])

# if both conditions are True, add a "potential_customer" string
df.loc[m1&m2, 'new'] = 'potential_customer'

output:

   bureau0    bureau1    bureau2    bureau3  bureau4  waysConstitution                 new
0  4753900  4773300.0        NaN  4713002.0    Three              2135  potential_customer
1  4721103        NaN        NaN        NaN  4751201              2135                 NaN
2  6204000  6201501.0  6319400.0        NaN  6201501              2062  potential_customer
3  7911200  8230001.0        NaN  8291100.0      NaN              2135  potential_customer
4  4713002        NaN        NaN  4751201.0  4753900              2054                 NaN
alternative output (Yes/No):
import numpy as np
m1 = df.filter(like='bureau').isin(df2['top_bureau'].unique()).any(1)
m2 = df['waysConstitution'].isin([2135, 2062])

df['potential_customer'] = np.where(m1&m2, 'Yes', 'No')

output:

   bureau0    bureau1    bureau2    bureau3  bureau4  waysConstitution potential_customer
0  4753900  4773300.0        NaN  4713002.0    Three              2135                Yes
1  4721103        NaN        NaN        NaN  4751201              2135                 No
2  6204000  6201501.0  6319400.0        NaN  6201501              2062                Yes
3  7911200  8230001.0        NaN  8291100.0      NaN              2135                Yes
4  4713002        NaN        NaN  4751201.0  4753900              2054                 No

CodePudding user response:

You can use iteritems. This will loop over everything. Since I dont see your expected output or your code so far I cant really know what you exactly want.

for (columnName, columnData) in df.iteritems():
    print('Column Name : ', columnName)
    print('Column Contents : ', columnData.values)

You can fill this with your condition needed to create a new column?

CodePudding user response:

There's a few ways to do it, as you can see by the solutions. Here's another:

import pandas as pd
import numpy as np

top_bureau = ['7911200','4713002','4773300','5811500','6201501']

columns = ['bureau0','bureau1','bureau2','bureau3','bureau4','waysConstitution']
data = [['4753900', '4773300',  np.nan, '4713002',  'Three',    '2135'],
['4721103', np.nan, np.nan, np.nan, '4751201',  '2135'],
['6204000', '6201501',  '6319400',  np.nan, '6201501',  '2062'],
['7911200', '8230001',  np.nan, '8291100',  np.nan, '2135'],
['123', np.nan, np.nan, '456',  '789',  '101112'],
['4713002', np.nan, np.nan, '4751201',  '4753900',  '2054']]

df = pd.DataFrame(data, columns=columns)


df['check'] = df[df.columns].apply(lambda x: list(x), axis=1)
potentialCustomerIdx = list(df[(pd.DataFrame(df.check.tolist()).isin(top_bureau).any(1).values) & (df['waysConstitution'].isin(["2135","2062"]))].index)

df['potential_customer'] = 'No'
df.loc[potentialCustomerIdx,'potential_customer'] = 'Yes'
df = df.drop(['check'],axis=1)

Output:

print(df)
   bureau0  bureau1  bureau2  ...  bureau4 waysConstitution potential_customer
0  4753900  4773300      NaN  ...    Three             2135                Yes
1  4721103      NaN      NaN  ...  4751201             2135                 No
2  6204000  6201501  6319400  ...  6201501             2062                Yes
3  7911200  8230001      NaN  ...      NaN             2135                Yes
4      123      NaN      NaN  ...      789           101112                 No
5  4713002      NaN      NaN  ...  4753900             2054                 No

[6 rows x 7 columns]
  • Related