I want writing a 'for' loop using Python script (pandas dataframe), and would like to append the index value to the end of dataframe name to differentiate each of them, how can I do it?
For example, I have a dataframe df
with column value
to be 1~5; and would like to split the dataset into 5 pieces, each has value
to be '1' / '2'/ '3'/ '4'/ '5'.
I've tried the following which seems to have syntax error. How can I change it? thanks
for i in range(1, 5):
df_f'{i}' = df.loc[df['value'] == i]
Note:
I'd like the desired dataframe name to be df_1. df_2, df_3, df_4, df_5
CodePudding user response:
As @hbgoddard pointed out, it's bad practice to generate variable names dynamically (at least in production code). However, if you really want to do it, edit globals()
like so:
for i in range(1, 5):
globals()[f'df_{i}'] = df.loc[df['value'] == i]
CodePudding user response:
It is not recommended to generate variable names at runtime; use a list or dictionary instead.
df_parts = {i: df.loc[df['value'] == i] for i in range(1, 6)}
A more concise version that handles all unique values of value
instead of just 1 through 5:
df_parts = dict(list(df.groupby('value')))
You can then access each part as df_parts[1]
, df_parts[2]
, etc.