How to make this iteration loop between all columns?-CodePudding

I have a dataframe called df which has the following columns header of data:

bureau0	bureau1	bureau2	bureau3	bureau4	waysConstitution
4753900	4773300	NaN	4713002	Three	2135
4721103	NaN	NaN	NaN	4751201	2135
6204000	6201501	6319400	NaN	6201501	2062
7911200	8230001	NaN	8291100	NaN	2135
4713002	NaN	NaN	4751201	4753900	2054

In fact, the bureau column goes up to 100, but I think these 5 cols examples already help.

I also have another dataset, called top_bureau, which contains some bureau code. Like this:

top_bureau
7911200
4713002
4773300
5811500
6201501

A more lines in the top_bureau, but I think for example it should work.

What I would like to do is: Create a new column in df that returns the value "potential_costumer" if in any bureau column there is some code that has in top_bureau and the waysConstitution column has the value equal to "2135" or "2062".

Expected output:

bureau0	bureau1	bureau2	bureau3	bureau4	waysConstitution	potential_customer
4753900	4773300	NaN	4713002	Three	2135	Yes
4721103	NaN	NaN	NaN	4751201	2135	Yes
6204000	6201501	6319400	NaN	6201501	2062	Yes
7911200	8230001	NaN	8291100	NaN	2135	Yes
4713002	NaN	NaN	4751201	4753900	2054	No

Reasoning example: As in line 1, in the bureau3 column the value "4713002" is in the top_bureau table and the "waysConstitution" column is equal to 2135 or 2062 so this customer can be flagged as a potential customer

How to make this iteration loop between all columns?

CodePudding user response：

IIUC, you can perform simple boolean indexing.

I am assuming df the first dataframe and df2 the second one.

# is any row value in top_bureau?
m1 = df.filter(like='bureau').isin(df2['top_bureau'].unique()).any(1)
# is waysConstitution 2135 or 2062?
m2 = df['waysConstitution'].isin([2135, 2062])

# if both conditions are True, add a "potential_customer" string
df.loc[m1&m2, 'new'] = 'potential_customer'

output:

   bureau0    bureau1    bureau2    bureau3  bureau4  waysConstitution                 new
0  4753900  4773300.0        NaN  4713002.0    Three              2135  potential_customer
1  4721103        NaN        NaN        NaN  4751201              2135                 NaN
2  6204000  6201501.0  6319400.0        NaN  6201501              2062  potential_customer
3  7911200  8230001.0        NaN  8291100.0      NaN              2135  potential_customer
4  4713002        NaN        NaN  4751201.0  4753900              2054                 NaN

alternative output (Yes/No):

import numpy as np
m1 = df.filter(like='bureau').isin(df2['top_bureau'].unique()).any(1)
m2 = df['waysConstitution'].isin([2135, 2062])

df['potential_customer'] = np.where(m1&m2, 'Yes', 'No')

output:

   bureau0    bureau1    bureau2    bureau3  bureau4  waysConstitution potential_customer
0  4753900  4773300.0        NaN  4713002.0    Three              2135                Yes
1  4721103        NaN        NaN        NaN  4751201              2135                 No
2  6204000  6201501.0  6319400.0        NaN  6201501              2062                Yes
3  7911200  8230001.0        NaN  8291100.0      NaN              2135                Yes
4  4713002        NaN        NaN  4751201.0  4753900              2054                 No

CodePudding user response：

You can use iteritems. This will loop over everything. Since I dont see your expected output or your code so far I cant really know what you exactly want.

for (columnName, columnData) in df.iteritems():
    print('Column Name : ', columnName)
    print('Column Contents : ', columnData.values)

You can fill this with your condition needed to create a new column?

CodePudding user response：

There's a few ways to do it, as you can see by the solutions. Here's another:

import pandas as pd
import numpy as np

top_bureau = ['7911200','4713002','4773300','5811500','6201501']

columns = ['bureau0','bureau1','bureau2','bureau3','bureau4','waysConstitution']
data = [['4753900', '4773300',  np.nan, '4713002',  'Three',    '2135'],
['4721103', np.nan, np.nan, np.nan, '4751201',  '2135'],
['6204000', '6201501',  '6319400',  np.nan, '6201501',  '2062'],
['7911200', '8230001',  np.nan, '8291100',  np.nan, '2135'],
['123', np.nan, np.nan, '456',  '789',  '101112'],
['4713002', np.nan, np.nan, '4751201',  '4753900',  '2054']]

df = pd.DataFrame(data, columns=columns)


df['check'] = df[df.columns].apply(lambda x: list(x), axis=1)
potentialCustomerIdx = list(df[(pd.DataFrame(df.check.tolist()).isin(top_bureau).any(1).values) & (df['waysConstitution'].isin(["2135","2062"]))].index)

df['potential_customer'] = 'No'
df.loc[potentialCustomerIdx,'potential_customer'] = 'Yes'
df = df.drop(['check'],axis=1)

Output:

print(df)
   bureau0  bureau1  bureau2  ...  bureau4 waysConstitution potential_customer
0  4753900  4773300      NaN  ...    Three             2135                Yes
1  4721103      NaN      NaN  ...  4751201             2135                 No
2  6204000  6201501  6319400  ...  6201501             2062                Yes
3  7911200  8230001      NaN  ...      NaN             2135                Yes
4      123      NaN      NaN  ...      789           101112                 No
5  4713002      NaN      NaN  ...  4753900             2054                 No

[6 rows x 7 columns]