I have a DataFrame and a list of dictionaries. The DataFrame has 84k rows. Each row is an account for a specific client.
Each dict in the list belongs to a specific client. They can have up to 50 keys and as few as 2 keys. The dictionaries also need to be applied in the order they are listed. The first key/value in each dict shows the name of the client the dict belongs to. The second key/value is the name of the rule.
List of Dict Example:
0 {'client': 'client 1', 'Billing Code': 'TNL', 'Valuations': '0', 'Account Number': '>99999'}
1 {'client': 'client 1', 'Billing Code': 'MF', 'User': 'BP', 'Flag': 'S'}
...
13 {'client': 'client 2', 'Billing Code': 'TNL', 'Acct Desc': '*test*}
length: 427, dtype: object
DataFrame has these column names
df.columns = ['Source.Name','User Bank','Bank','Account Number','Account Description','Valuation Date',
'Preschedule','MF Flag','Load Flag','Global Flag','Money Market Flag','Days Prior to Valuation',
'Number of Holdings','Total Assets','Unit Value/NAV','MCS Field','From Date','Valuations',
'# Sweeps','NASDAQ','TLA','Account Type','Fund Group','Master Account Text','Master Feeder Flag',
'Acct Flag 2','Acct Field 4','Securities At Value','Net Assets','Acct Field 1','Acct Field 2',
'Group Account Indicator','Group Account Number','Region','Account Status','SMS Billing Code',
'Translation Date','Portfolio Manager 1','Acct Flag 1','Dual Flag','Securities At Value Base',
'Net Assets Base','Total Assets Base','Dual OEIC']
Input DataFrame Dataframe containing data directly from client files
Client Short Name | Source.Name | User Bank | Bank | Account Number | Account Description |
---|---|---|---|---|---|
Client #1 | C#1.txt | AA | 01 | 1 | Test Account |
Client #1 | C#1.txt | AC | 01 | 2 | MY ACCOUNT |
Client #1 | C#1.txt | AC | 01 | 3 | SUPER FUND |
Client #1 | C#1.txt | AY | 01 | 4 | S&P INDEX |
Client #1 | C#1.txt | AY | 01 | 5 | Test Account |
Client #1 | C#1.txt | AA | 01 | 6 | INDEX |
Client #1 | C#1.txt | AA | 01 | 7 | Test Account |
Client #1 | C#1.txt | AA | 01 | 8 | RYAN'S Account |
Client #2 | C#2.txt | BA | 01 | 1 | Test Account |
Client #2 | C#2.txt | BB | 01 | 33 | INDEX |
Client #2 | C#2.txt | BB | 01 | 92 | Test Account |
Client #2 | C#2.txt | BZ | 01 | 123123 | INDEX |
Client #3 | C#3.txt | BB | 01 | 1657 | Test Account |
Client #3 | C#3.txt | BP | 01 | 15454 | Test Account |
Client #4 | C#4.txt | GH | 01 | 100 | Test Account |
Client #4 | C#4.txt | GH | 01 | 19875 | INDEX |
Client #4 | C#4.txt | GY | 01 | 13579 | Test Account |
Client #4 | C#4.txt | GE | 01 | 2 | INDEX |
Client #4 | C#4.txt | GE | 01 | 72 | Test Account |
Client #4 | C#4.txt | GP | 01 | 96 | GREEN Account |
Desired Output Output should be the dataframe with a new column ['Billing Code'] based on the criteria from one of the 427 dictionaries.
Client Short Name | Source.Name | User Bank | Bank | Account Number | Account Description | Billing Code |
---|---|---|---|---|---|---|
Client #1 | C#1.txt | AA | 01 | 1 | Test Account | TNL |
Client #1 | C#1.txt | AC | 01 | 2 | MY ACCOUNT | MF |
Client #1 | C#1.txt | AC | 01 | 3 | SUPER FUND | HF |
Client #1 | C#1.txt | AY | 01 | 4 | S&P INDEX | Index |
Client #1 | C#1.txt | AY | 01 | 5 | Test Account | TNL |
Client #1 | C#1.txt | AA | 01 | 6 | INDEX | Index |
Client #1 | C#1.txt | AA | 01 | 7 | Test Account | TNL |
Client #1 | C#1.txt | AA | 01 | 8 | RYAN'S Account | HF |
Client #2 | C#2.txt | BA | 01 | 1 | Test Account | TNL |
Client #2 | C#2.txt | BB | 01 | 33 | INDEX | Index |
Client #2 | C#2.txt | BB | 01 | 92 | Test Account | TNL |
Client #2 | C#2.txt | BZ | 01 | 123123 | INDEX | Index |
Client #3 | C#3.txt | BB | 01 | 1657 | Test Account | TNL |
Client #3 | C#3.txt | BP | 01 | 15454 | Test Account | TNL |
Client #4 | C#4.txt | GH | 01 | 100 | Test Account | TNL |
Client #4 | C#4.txt | GH | 01 | 19875 | INDEX | Index |
Client #4 | C#4.txt | GY | 01 | 13579 | Test Account | TNL |
Client #4 | C#4.txt | GE | 01 | 2 | INDEX | Index |
Client #4 | C#4.txt | GE | 01 | 72 | Test Account | TNL |
Client #4 | C#4.txt | GP | 01 | 96 | GREEN Account | MF |
Column names match keys.
I basically need to iterate through each row of the data and determine if it meets the criteria in the first dict. If it does then df['Billing Code'] = that specific dict['Billing Code'] if that makes sense. If not then move on to the next billing code.
Iteration could take a very long time to run through all of this hence the "Not Iterate" in the title. Not sure if this is something list comprehension can do.
Thank you for any help anyone can provide!
CodePudding user response:
EDIT:
Based on your comments, I'd first create a mapping ClientID -> List of Dictionaries
:
lst = [
{
"client": "Client #1",
"Billing Code": "TNL",
"Bank": 1,
"Account Number": 1,
},
{
"client": "Client #1",
"Billing Code": "MF",
"User Bank": "AY",
"Bank": 1,
},
{
"Billing Code": "TNL",
"client": "Client #2",
"User Bank": "BB",
},
]
# create a mapping client no. -> list of dictionaries
m = {}
for d in lst:
m.setdefault(d["client"], []).append(d)
d.pop("client")
Then I'd use df.groupby
by Client ID and apply custom function:
def fn(x):
dictionaries = m.get(x.name, [])
out = []
for _, row in x.iterrows():
for d in dictionaries:
if all(row[k] == v for k, v in d.items() if k != "Billing Code"):
out.append(d["Billing Code"])
break
else:
out.append("Unclassified")
x["Billng Code"] = out
return x
df = df.groupby("Client Short Name").apply(fn)
print(df)
The result is:
Client Short Name Source.Name User Bank Bank Account Number Account Description Billng Code
0 Client #1 C#1.txt AA 1 1 Test Account TNL
1 Client #1 C#1.txt AC 1 2 MY ACCOUNT Unclassified
2 Client #1 C#1.txt AC 1 3 SUPER FUND Unclassified
3 Client #1 C#1.txt AY 1 4 S&P INDEX MF
4 Client #1 C#1.txt AY 1 5 Test Account MF
5 Client #1 C#1.txt AA 1 6 INDEX Unclassified
6 Client #1 C#1.txt AA 1 7 Test Account Unclassified
7 Client #1 C#1.txt AA 1 8 RYAN'S Account Unclassified
8 Client #2 C#2.txt BA 1 1 Test Account Unclassified
9 Client #2 C#2.txt BB 1 33 INDEX TNL
10 Client #2 C#2.txt BB 1 92 Test Account TNL
11 Client #2 C#2.txt BZ 1 123123 INDEX Unclassified
12 Client #3 C#3.txt BB 1 1657 Test Account Unclassified
13 Client #3 C#3.txt BP 1 15454 Test Account Unclassified
14 Client #4 C#4.txt GH 1 100 Test Account Unclassified
15 Client #4 C#4.txt GH 1 19875 INDEX Unclassified
16 Client #4 C#4.txt GY 1 13579 Test Account Unclassified
17 Client #4 C#4.txt GE 1 2 INDEX Unclassified
18 Client #4 C#4.txt GE 1 72 Test Account Unclassified
19 Client #4 C#4.txt GP 1 96 GREEN Account Unclassified