I am able to print the small dataframe and see it is being generated correctly, I've written it using the code below. My final result however contains just the result of the final merge, as opposed to passing over each one and merging them.
MIK_Quantiles is the first larger dataframe, df2_t is the smaller dataframe being generated in the while loop. The dataframes are both produced correctly and the merge works, but I'm left with just the result of the very last merge. I want it to merge the current df2_t with the already merged result (df_merged) of the previous loop. I hope this makes sense!
i = 0
while i < df_length - 1:
cur_bound = MIK_Quantiles['bound'].iloc[i]
cur_percentile = MIK_Quantiles['percentile'].iloc[i]
cur_bin_low = MIK_Quantiles['auppm'].iloc[i]
cur_bin_high = MIK_Quantiles['auppm'].iloc[i 1]
### Grades/Counts within bin, along with min and max
df2 = df_orig['auppm'].loc[(df_orig['bound'] == cur_bound) & (df_orig['auppm'] >= cur_bin_low) & (df_orig['auppm'] < cur_bin_high)].describe()
### Add fields of interest to the output of describe for later merging together
df2['bound'] = cur_bound
df2['percentile'] = cur_percentile
df2['bin_name'] = 'bin name'
df2['bin_lower'] = cur_bin_low
df2['bin_upper'] = cur_bin_high
df2['temp_merger'] = str(int(df2['bound'])) '_' str(df2['percentile'])
# Write results of describe to a CSV file and transpose columns to rows
df2.to_csv('df2.csv')
df2_t = pd.read_csv('df2.csv').T
df2_t.columns = ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max', 'bound', 'percentile', 'bin_name', 'bin_lower', 'bin_upper', 'temp_merger']
# Merge the results of the describe on the selected data with the table of quantile values to produce a final output
df_merged = MIK_Quantiles.merge(df2_t, how = 'inner', on = ['temp_merger'])
pd.merge(df_merged, df2_t)
print(df_merged)
i = i 1
CodePudding user response:
Your loop does not do anything meaningful, other than increment i
.
You do a merge of 2 (static) dfs (MIK_Quantiles
and df2_t
), and you do that df_length
number of times. Everytime you do that (first, i-th, and last iteration of the loop), you overwrite the output variable df_merged
.
To keep in the output whatever has been created in the previous loop iteration, you need to concat all the created df2_t
:
df2 = pd.concat([df2, df2_t])
to 'append' the newly created datadf2_t
to an output dataframedf2
during each iteration of the loop, so in the end all the data will be contained indf2
Then, after the loop, merge
that one onto MIK_Quantiles
pd.merge(MIK_Quantiles, df2)
(notdf2_t
(!)) to merge on the previous output
df2 = pd.DataFrame([]) # initialize your output
for i in range(0, df_length):
df2_t = ... # read your .csv files
df2 = pd.concat([df2, df2_t])
df2 = ... # do vector operations on df2 (process all of the df2_t at once)
out = pd.merge(MIK_Quantiles, df2)