Is there a way to create n number of dataframes from columns in a dataframe? Phyton-CodePudding

hope you're fine.

I'm working with some dataframes that look like this:

df:
          Col1    Col2    Col3    ...     Coln
Row1   A       7        2              n
Row2   B       5        10             n
Row3   C       3        5              n

As you can see, it has n number of columns. I'm trying to make n number of dataframes with the column "Col1" and each one of the others, which also I would call each dataframe then or apply to all a function. It would look something like this:

df1:

      Col1    Col2
Row1   A       7 
Row2   B       5 
Row3   C       3 

df2:

      Col1    Col3
Row1   A       2 
Row2   B       10 
Row3   C       5 


... 

dfn:

      Col1    Coln
Row1   A       n 
Row2   B       n 
Row3   C       n

I know that I can manually use .iloc[:,n] but that's not practical for n columns.

So, I have tried this way with dictionaries:

columns_list = df.columns.values.tolist()
d = {}

for name in columns_list:
  for i in range(1, len(df.columns) 1):
    d[name] = pd.DataFrame(data = (df1["Col1"],df.iloc[:,i]), columns = ["XYZ", "ABC"])

Bad news: doesn't work.

I have also tried with a function:

df_base = pd.DataFrame(data = df.iloc[:,0])
def particion(df):
    for i in range(1, len(df.columns) 1): 
        df["df_"   str(i)] = df_base.join(df.iloc[:,i])

Bad news again: doesn't work.

I have done my research but couldn't find specifically someone that has had the same thing.

Does someone please have an idea of what can I do?

CodePudding user response：

So you want to start by creating a list of your variable names, this can be done with list comprehension. As an example with n=5

n = 5
variable_names = [f"df{i}" for i in range(1,n 1)]
print(variable_names) # Output: ['df1', 'df2', 'df3', 'df4', 'df5']

From here you can create your list of column names and create a constant variable for your first column name

FIRST_COLUMN_NAME = list(df.columns)[0]
column_names = list(df.columns)[1:]

Then you can make use of globals() and zip() to iterate through and create the variables:

for variable, column_name in zip(variable_names, column_names):
    globals()[variable] = df[[FIRST_COLUMN_NAME, column_name]]

Using a test dataframe:

   col1  col2  col3  col4  col5  col6
0     1     2     3     4     5     6
1     2     3     4     5     6     7

I received the following outputs:

>>> print(df1)
   col1  col2
0     1     2
1     2     3
>>> print(df2)
   col1  col3
0     1     3
1     2     4
>>> print(df3)
   col1  col4
0     1     4
1     2     5

and so on.