I'm trying to merge a small dataframe to another large one, looping through the small dataframe-CodePudding

I am able to print the small dataframe and see it is being generated correctly, I've written it using the code below. My final result however contains just the result of the final merge, as opposed to passing over each one and merging them.

MIK_Quantiles is the first larger dataframe, df2_t is the smaller dataframe being generated in the while loop. The dataframes are both produced correctly and the merge works, but I'm left with just the result of the very last merge. I want it to merge the current df2_t with the already merged result (df_merged) of the previous loop. I hope this makes sense!

i = 0
while i < df_length - 1:   


    cur_bound = MIK_Quantiles['bound'].iloc[i]
    cur_percentile = MIK_Quantiles['percentile'].iloc[i]
    cur_bin_low = MIK_Quantiles['auppm'].iloc[i]
    cur_bin_high = MIK_Quantiles['auppm'].iloc[i 1]

    ### Grades/Counts within bin, along with min and max
    df2 = df_orig['auppm'].loc[(df_orig['bound'] == cur_bound) & (df_orig['auppm'] >= cur_bin_low) & (df_orig['auppm'] < cur_bin_high)].describe()

    ### Add fields of interest to the output of describe for later merging together
    df2['bound'] = cur_bound
    df2['percentile'] = cur_percentile
    df2['bin_name'] = 'bin name'
    df2['bin_lower'] = cur_bin_low
    df2['bin_upper'] = cur_bin_high
    df2['temp_merger'] =  str(int(df2['bound']))   '_'   str(df2['percentile'])

    # Write results of describe to a CSV file and transpose columns to rows
    df2.to_csv('df2.csv')
    df2_t = pd.read_csv('df2.csv').T
    df2_t.columns = ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max', 'bound', 'percentile', 'bin_name', 'bin_lower', 'bin_upper', 'temp_merger']

    # Merge the results of the describe on the selected data with the table of quantile values to produce a final output    
    df_merged = MIK_Quantiles.merge(df2_t, how = 'inner', on = ['temp_merger'])
    pd.merge(df_merged, df2_t)
    print(df_merged)


i = i   1

CodePudding user response：

Your loop does not do anything meaningful, other than increment i.

You do a merge of 2 (static) dfs (MIK_Quantiles and df2_t), and you do that df_length number of times. Everytime you do that (first, i-th, and last iteration of the loop), you overwrite the output variable df_merged.

To keep in the output whatever has been created in the previous loop iteration, you need to concat all the created df2_t:

df2 = pd.concat([df2, df2_t]) to 'append' the newly created data df2_t to an output dataframe df2 during each iteration of the loop, so in the end all the data will be contained in df2

Then, after the loop, merge that one onto MIK_Quantiles

pd.merge(MIK_Quantiles, df2) (not df2_t (!)) to merge on the previous output

df2 = pd.DataFrame([]) # initialize your output
for i in range(0, df_length):
    df2_t = ...       # read your .csv files
    df2 = pd.concat([df2, df2_t])
 df2 = ...      # do vector operations on df2 (process all of the df2_t at once)
 out = pd.merge(MIK_Quantiles, df2)