hope you're fine.
I'm working with some dataframes that look like this:
df:
Col1 Col2 Col3 ... Coln
Row1 A 7 2 n
Row2 B 5 10 n
Row3 C 3 5 n
As you can see, it has n number of columns. I'm trying to make n number of dataframes with the column "Col1" and each one of the others, which also I would call each dataframe then or apply to all a function. It would look something like this:
df1:
Col1 Col2
Row1 A 7
Row2 B 5
Row3 C 3
df2:
Col1 Col3
Row1 A 2
Row2 B 10
Row3 C 5
...
dfn:
Col1 Coln
Row1 A n
Row2 B n
Row3 C n
I know that I can manually use .iloc[:,n] but that's not practical for n columns.
So, I have tried this way with dictionaries:
columns_list = df.columns.values.tolist()
d = {}
for name in columns_list:
for i in range(1, len(df.columns) 1):
d[name] = pd.DataFrame(data = (df1["Col1"],df.iloc[:,i]), columns = ["XYZ", "ABC"])
Bad news: doesn't work.
I have also tried with a function:
df_base = pd.DataFrame(data = df.iloc[:,0])
def particion(df):
for i in range(1, len(df.columns) 1):
df["df_" str(i)] = df_base.join(df.iloc[:,i])
Bad news again: doesn't work.
I have done my research but couldn't find specifically someone that has had the same thing.
Does someone please have an idea of what can I do?
CodePudding user response:
So you want to start by creating a list of your variable names, this can be done with list comprehension. As an example with n=5
n = 5
variable_names = [f"df{i}" for i in range(1,n 1)]
print(variable_names) # Output: ['df1', 'df2', 'df3', 'df4', 'df5']
From here you can create your list of column names and create a constant variable for your first column name
FIRST_COLUMN_NAME = list(df.columns)[0]
column_names = list(df.columns)[1:]
Then you can make use of globals()
and zip()
to iterate through and create the variables:
for variable, column_name in zip(variable_names, column_names):
globals()[variable] = df[[FIRST_COLUMN_NAME, column_name]]
Using a test dataframe:
col1 col2 col3 col4 col5 col6
0 1 2 3 4 5 6
1 2 3 4 5 6 7
I received the following outputs:
>>> print(df1)
col1 col2
0 1 2
1 2 3
>>> print(df2)
col1 col3
0 1 3
1 2 4
>>> print(df3)
col1 col4
0 1 4
1 2 5
and so on.