Home > Net >  All cells getting updated in pandas df using loc
All cells getting updated in pandas df using loc

Time:04-08

So I create an empty pandas df, where I initialize all the cell values to empty lists, except the diagonals, which are set to math.inf

The indexes are the start position, and the column headers are the end position

I want to get the start and end positions, and the difference between the days to get from start to end, and put that value in df.loc[start, end] by using append. But for some reason, every single cell in the df is getting updated, and i dont know why

My code is shown below

self.status_dict = {'nvc': 'At NVC',
                            'issued': 'Issued',
                            'ready': 'Ready',
                            'ar_ds260': 'Action required: Complete Form DS-260',
                            'transit': 'In Transit',
                            'refused': 'Refused',
                            'ar_doc': 'Action required: Submit requested documents',
                            'admin_process': 'Administrative Processing', 
                            'expire_soon': 'Expiring Soon',
                            'app_received': 'Application Received',
                            'ar_pay': 'Action required: Pay fees',
                            'return_nvc': 'Returned to NVC',
                            'transfer': 'Transfer in Progress',
                            'expired': 'Expired',
                            'ar_pay_miss': 'Action required: Pay missing fees', 
                            'no_action': 'No action required: Review in process', 
                            'no_status': 'No Status',
                            'ar_choose': 'Action required: Choose an agent'
                           }
self.status_dict_lookup = {'At NVC': 'nvc', 
                                   'Issued': 'issued', 
                                   'Ready': 'ready', 
                                   'Action required: Complete Form DS-260': 'ar_ds260', 
                                   'In Transit': 'transit', 
                                   'Refused': 'refused', 
                                   'Action required: Submit requested documents': 'ar_doc', 
                                   'Administrative Processing': 'admin_process', 
                                   'Expiring Soon': 'expire_soon', 
                                   'Application Received': 'app_received', 
                                   'Action required: Pay fees': 'ar_pay', 
                                   'Returned to NVC': 'return_nvc', 
                                   'Transfer in Progress': 'transfer', 
                                   'Expired': 'expired', 
                                   'Action required: Pay missing fees': 'ar_pay_miss', 
                                   'No action required: Review in process': 'no_action', 
                                   'No Status': 'no_status', 
                                   'Action required: Choose an agent': 'ar_choose'
                                  }

shape = len(self.status_dict_lookup)
        const_arr = [[]] * shape
        keys = self.status_dict.keys()
        df_dict = dict()
        for key in keys:
            df_dict[key] = const_arr
        df = pd.DataFrame(df_dict)
        df = df.set_index(pd.Index(keys))
        for key in keys:
            df.loc[key, key] = math.inf
        
#         cases = self.cases
        cases = {1044: [['Action required: Submit requested documents', '2021-12-18'], 
                        ['At NVC', '2022-02-03'], ['In Transit', '2022-02-14'], 
                        ['Ready', '2022-02-15'], ['Refused', '2022-03-10'], 
                        ['Administrative Processing', '2022-03-12'], ['Issued', '2022-03-14']]}
        for _, val in cases.items():
            print(val[0], val[1])
            print(val[0][1], val[1][1])
            for i in range(len(val) - 1):
                temp = []
                start = val[i][0]
                end   = val[i   1][0]
                
                start_time = datetime.strptime(val[i][1], '%Y-%m-%d')
                end_time = datetime.strptime(val[i   1][1], '%Y-%m-%d')
                diff = end_time - start_time
                
                temp = df[self.status_dict_lookup[start]][self.status_dict_lookup[end]]
                print(temp)
                temp.append(diff.days)
                df.loc[self.status_dict_lookup[start], self.status_dict_lookup[end]] = temp

part of the output of the df is shown below:

nvc                 issued  \
nvc                              inf  [47, 11, 1, 23, 2, 2]   
issued         [47, 11, 1, 23, 2, 2]                    inf   
ready          [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
ar_ds260       [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
transit        [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
refused        [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
ar_doc         [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
admin_process  [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
expire_soon    [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
app_received   [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
ar_pay         [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
return_nvc     [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
transfer       [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
expired        [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
ar_pay_miss    [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
no_action      [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
no_status      [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2]   
ar_choose      [47, 11, 1, 23, 2, 2]  [47, 11, 1, 23, 2, 2] 

So for the first example,

start = Action required: Submit requested documents

end = At NVC

diff = 47

So i want it to store just 47 as a list in df[ar_doc][nvc]. But it is storing the difference of all days in all the cells

Why does this happen and how to fix it?

CodePudding user response:

All your pandas data are referencing the same list. You should change how you initialize the DataFrame. You should create a new list in each cell.

Try:

df =  pd.DataFrame({k: [list() for _ in range(len(status_dict))] for k in status_dict}, 
                   index=status_dict.keys())
for key in keys:
    df.at[key, key] = math.inf

Separately, since you're already using pandas, you don't need to use datetime to parse dates. You can reduce your loop to the following:

for _, val in cases.items():
    for i in range(len(val)-1):
        diff = pd.to_datetime(val[i][1], format='%Y-%m-%d') - pd.to_datetime(val[i   1][1], format='%Y-%m-%d')
        df.at[status_dict_lookup[val[i][0]], status_dict_lookup[val[i 1][0]]]  = [diff.days]
  • Related