I have a dataset
like below:
Customer_ID Policy_No Age Channel Status Income
123 5432 5 DD Lapse 300000
123 5431 1 Agency Active 420000
234 4352 7 DD Lapse 650000
and a master
dataset:
Group Attributes S2SFlag
Age 1 to 5 Yes
Age 6 to 10 No
Channel DD Yes
Channel Agency Yes
Status Lapse Yes
Status Active No
Income 300000 to 500000 Yes
Income 500000 to 700000 No
Here, the condition is to check the associated value for the Group
and its value from the dataset with the master
dataset and pick the S2SFlag
column value from the master dataset and create an S2SFlag column in the dataset. The original dataset is returned by querying the database using AWS Lambda.
Also, if post-checking if there is a 'No'
value for any of the Group
attributes/row
then the Flag
for that particular group would be 'No'
. The expected
output is as below:
I have tried the below code but it is giving me the error:
import pandas as pd
import re
df1 = pd.read_csv('') # Master table
def to_range(s):
m = re.match('(\d ) to (\d )$', s)
return range(int(m.group(1)), int(m.group(2)) 1) if m else s
df1 = (df1.assign(Attributes=[to_range(s) for s in df1['Attributes']])
.explode('Attributes'))
df2 = pd.read_csv('') # Assuming it as query results
mapping_Age = df1[df1['Group'] == 'Age'].set_index('Attributes')['S2SFlag'] == 'Yes'
mapping_Channel = df1[df1['Group'] == 'Channel'].set_index('Attributes')['S2SFlag'] == 'Yes'
mapping_Status = df1[df1['Group'] == 'Status'].set_index('Attributes')['S2SFlag'] == 'Yes'
mapping_Income = df1[df1['Group'] == 'Income'].set_index('Attributes')['S2SFlag'] == 'Yes'
mapped_Age = df2['Age'].map(mapping_Age)
mapped_Channel = df2['Channel'].map(mapping_Channel)
mapped_Status = df2['Status'].map(mapping_Status)
mapped_Income = df2['Income'].map(mapping_Income)
df2['S2SFlag'] = (mapped_Age & mapped_Status & mapped_Channel & mapped_Income).map({True: 'Yes', False: 'No'})
Traceback (most recent call last):
File "", line 27, in <module>
mapped_Income = df2['Income'].map(mapping_Income)
File "", line 4161, in map
new_values = super()._map_values(arg, na_action=na_action)
File "", line 842, in _map_values
indexer = mapper.index.get_indexer(values)
File "", line 3442, in get_indexer
raise InvalidIndexError(self._requires_unique_msg)
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Also, I tried something like below which also gave me another error:
import pandas as pd
import re
df1 = pd.read_csv('') #Master table
df2 = pd.read_csv('') #Dataset
df1 = df1[df1['Group'] == 'Income']
for i, j in df2['Income'].iteritems():
for k, l in zip(df1['Attributes'], df1['S2SFlag']):
start = k.split(" to ")[0]
end = k.split(" to ")[1]
if j in range(int(start), int(end)):
print("J ", j, " l ", l)
df2['S2SFlag'] = l
Output:
Customer_ID Policy_No Age Channel Status Income S2SFlag
0 123 5432 5 DD Lapse 300000 No
1 123 5431 1 Agency Active 420000 No
2 234 4352 7 DD Lapse 650000 No
Where am I going wrong?
Update 1:
As per Raymond, everything worked like a charm except for one scenario where I am not getting the output as expected.
Master Data
:
Group Attributes S2SFlag
Insured Age Group 1 to 20 Yes
Insured Age Group 21 to 25 Yes
Insured Age Group 26 to 30 Yes
Insured Age Group 31 to 40 Yes
Insured Age Group 41 to 50 Yes
Insured Age Group 51 to 100 Yes
Policy Status Lapse Yes
Policy Status Active Yes
Policy Status RPU No
Policy Status Paid Up/ PPT Over Yes
Policy Status Claim settled Yes
Policy Status Claim rejected No
Income Group 0 to 300000 Yes
Income Group 300001 to 500000 Yes
Income Group 500001 to 700000 Yes
Income Group 700001 to 1000000 Yes
Income Group 1000001 to 1500000 Yes
Income Group 1500001 to 1600000 Yes
Marrital status Single Yes
Marrital status Married Yes
Marrital status Divorced Yes
Marrital status Married with a child Yes
Marrital status Single Parent Yes
Marrital status 2 children Yes
Occupation Salaried Yes
Occupation Daily wage No
Occupation Self employed with <5LPA No
Occupation Self employed with >5LPA Yes
Occupation House wife Yes
Occupation House Husband Yes
Education 1 to 10 Yes
Education 11 to 12 Yes
Education Diploma Yes
Education Graduate Yes
Education Post Graduate Yes
Education Doctoral Yes
Fund Value Positive Yes
Fund Value Negative No
Fund Value Zero No
When I tried printing the mapped values for Education
, it gave me results like below:
mapped_Education = df2['Education'].map(mapping_Education)
print(mapped_Education)
0 NaN
1 NaN
2 True
Name: Education, dtype: object
Dataset
:
Customer_ID Policy_No Age Status Income Marital_Status Occupation Education Fund Value
123 5432 5 Lapse 300000 Single Salaried 6 Positive
123 5431 1 Active 420000 Married Daily wage 11 Zero
234 4352 7 Lapse 650000 Divorced Daily wage Doctoral Zero
Due to this, I am getting output
as below:
Customer_ID Policy_No Age Status ... Occupation Education Fund Value S2SFlag
0 123 5432 5 Lapse ... Salaried 6 Positive No
1 123 5431 1 Active ... Daily wage 11 Zero No
2 234 4352 7 Lapse ... Daily wage Doctoral Zero No
Where S2SFlag
for the first record should have been 'Yes'
CodePudding user response:
The error Reindexing only valid with uniquely valued Index objects
says that map
requires your mapping_Income
to have unique index, and the index originates from df1[df1['Group'] == 'Income'].set_index('Attributes')
where the 'Attributes'
for Income
contains exploded values that are not unique.
Your code for exploding is range inclusive, so when you explode the two ranges for Income, 500000 will appear once in each explode, ending up with two 500000 and thus the error. Please try to modify either one of the ranges to exclude its 500000, e.g. make the second range 500001 to 700000.
For Update 1:
I suspect it has to do with data-type again. The exploded value could be integers, but all values of df2['Education']
are string.
One quick fix is to change everything into strings.
#1 add df1['Attributes'] = df1['Attributes'].astype(str)
after the line for explode
#2 add df2 = df2.astype(str)
before any map
to make all contents strings