So I create an empty pandas df, where I initialize all the cell values to empty lists, except the diagonals, which are set to math.inf
The indexes are the start position, and the column headers are the end position
I want to get the start and end positions, and the difference between the days to get from start to end, and put that value in df.loc[start, end] by using append. But for some reason, every single cell in the df is getting updated, and i dont know why
My code is shown below
self.status_dict = {'nvc': 'At NVC',
'issued': 'Issued',
'ready': 'Ready',
'ar_ds260': 'Action required: Complete Form DS-260',
'transit': 'In Transit',
'refused': 'Refused',
'ar_doc': 'Action required: Submit requested documents',
'admin_process': 'Administrative Processing',
'expire_soon': 'Expiring Soon',
'app_received': 'Application Received',
'ar_pay': 'Action required: Pay fees',
'return_nvc': 'Returned to NVC',
'transfer': 'Transfer in Progress',
'expired': 'Expired',
'ar_pay_miss': 'Action required: Pay missing fees',
'no_action': 'No action required: Review in process',
'no_status': 'No Status',
'ar_choose': 'Action required: Choose an agent'
}
self.status_dict_lookup = {'At NVC': 'nvc',
'Issued': 'issued',
'Ready': 'ready',
'Action required: Complete Form DS-260': 'ar_ds260',
'In Transit': 'transit',
'Refused': 'refused',
'Action required: Submit requested documents': 'ar_doc',
'Administrative Processing': 'admin_process',
'Expiring Soon': 'expire_soon',
'Application Received': 'app_received',
'Action required: Pay fees': 'ar_pay',
'Returned to NVC': 'return_nvc',
'Transfer in Progress': 'transfer',
'Expired': 'expired',
'Action required: Pay missing fees': 'ar_pay_miss',
'No action required: Review in process': 'no_action',
'No Status': 'no_status',
'Action required: Choose an agent': 'ar_choose'
}
shape = len(self.status_dict_lookup)
const_arr = [[]] * shape
keys = self.status_dict.keys()
df_dict = dict()
for key in keys:
df_dict[key] = const_arr
df = pd.DataFrame(df_dict)
df = df.set_index(pd.Index(keys))
for key in keys:
df.loc[key, key] = math.inf
# cases = self.cases
cases = {1044: [['Action required: Submit requested documents', '2021-12-18'],
['At NVC', '2022-02-03'], ['In Transit', '2022-02-14'],
['Ready', '2022-02-15'], ['Refused', '2022-03-10'],
['Administrative Processing', '2022-03-12'], ['Issued', '2022-03-14']]}
for _, val in cases.items():
print(val[0], val[1])
print(val[0][1], val[1][1])
for i in range(len(val) - 1):
temp = []
start = val[i][0]
end = val[i 1][0]
start_time = datetime.strptime(val[i][1], '%Y-%m-%d')
end_time = datetime.strptime(val[i 1][1], '%Y-%m-%d')
diff = end_time - start_time
temp = df[self.status_dict_lookup[start]][self.status_dict_lookup[end]]
print(temp)
temp.append(diff.days)
df.loc[self.status_dict_lookup[start], self.status_dict_lookup[end]] = temp
part of the output of the df is shown below:
nvc issued \
nvc inf [47, 11, 1, 23, 2, 2]
issued [47, 11, 1, 23, 2, 2] inf
ready [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
ar_ds260 [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
transit [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
refused [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
ar_doc [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
admin_process [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
expire_soon [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
app_received [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
ar_pay [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
return_nvc [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
transfer [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
expired [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
ar_pay_miss [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
no_action [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
no_status [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
ar_choose [47, 11, 1, 23, 2, 2] [47, 11, 1, 23, 2, 2]
So for the first example,
start = Action required: Submit requested documents
end = At NVC
diff = 47
So i want it to store just 47 as a list in df[ar_doc][nvc]. But it is storing the difference of all days in all the cells
Why does this happen and how to fix it?
CodePudding user response:
All your pandas data are referencing the same list. You should change how you initialize the DataFrame. You should create a new list in each cell.
Try:
df = pd.DataFrame({k: [list() for _ in range(len(status_dict))] for k in status_dict},
index=status_dict.keys())
for key in keys:
df.at[key, key] = math.inf
Separately, since you're already using pandas
, you don't need to use datetime
to parse dates. You can reduce your loop to the following:
for _, val in cases.items():
for i in range(len(val)-1):
diff = pd.to_datetime(val[i][1], format='%Y-%m-%d') - pd.to_datetime(val[i 1][1], format='%Y-%m-%d')
df.at[status_dict_lookup[val[i][0]], status_dict_lookup[val[i 1][0]]] = [diff.days]