I have a number of client rating flat files, one per business unit/division and it contains the rating for that specific client in that business unit. Some clients can cross multiple business units so could have one rating in one unit and a different rating in a different file or none at all.
I want a single data structure where I can look up a particular client and get all applicable ratings for that client.
example data
df1 = pandas.DataFrame({'client': ['Client_A', 'Client_B', 'Client_C'], 'bank_rating': ['A', 'A', 'B']})
df1.set_index('client', inplace=True)
df2 = pandas.DataFrame({'client': ['Client_A', 'Client_C'], 'insurance_rating': ['B', 'A']})
df2.set_index('client', inplace=True)
df3 = pandas.DataFrame({'client': ['Client_B', 'Client_D'], 'wealth_rating': ['D', 'A']})
df3.set_index('client', inplace=True)
frames = [df1, df2, df3] #in reality frames is a dynamic list built from x number of files in a folder
attempted solution
df = pandas.concat(frames)
bank_rating insurance_rating wealth_rating
client
Client_A A NaN NaN
Client_B A NaN NaN
Client_C B NaN NaN
Client_A NaN B NaN
Client_C NaN A NaN
Client_B NaN NaN D
Client_D NaN NaN A
what i want is a unique row per customer with each business unit rating as column
bank_rating insurance_rating wealth_rating
client
Client_A A B NaN
Client_B A NaN D
Client_C B A NaN
Client_D NaN NaN A
CodePudding user response:
You are almost there, it's just that concat assumes that you want to add new rows to the dataframes (the 0'th axis). So writing
df = pandas.concat(frames, axis=1)
should do the trick for you.