Home > other >  Pythonic way to create a new dataframe from each row of an existing dataframe
Pythonic way to create a new dataframe from each row of an existing dataframe

Time:09-30

Please recommend a pythonic way to create a new data frame from each row of an existing data frame.

The suggestion MUST take into account that the number of rows of the existing data frame is random, so the solution offered has to account for that. (For the sake of the example below the original data frame includes 3 rows, however, the actual number of rows in the original data frame will be random.) The columns of the original data frame will remain unchanged.

Original dataframe:

import pandas as pd
from numpy.random import randn
 
df = pd.DataFrame(randn(3,3), columns=['column 1', 'column 2', 'column 3'], index = ['row 1', 'row 2', 'row 3'])
print(df)

output:

       column 1  column 2  column 3
row 1  0.972855 -0.179018  0.177614
row 2 -2.146628 -1.639054 -0.708013
row 3 -1.295298 -0.313462 -0.229140

Desired output AFTER the solution has been implemented (Three new data frame is created as below, preserving the original columns):

dataframe 1:

   column 1  column 2  column 3
row 1  0.972855 -0.179018  0.177614

dataframe 2:

    column 1  column 2  column 3
row 2 -2.146628 -1.639054 -0.708013

dataframe 3:

    column 1  column 2  column 3
row 3 -2.146628 -1.639054 -0.708013

I also would like to retain the ability to address the new data frames created, and manipulate the data within them.

I have tried to implement my own solution by using the .iterrows function and using dynamically created variables, but I would like to know what would be the recommended, most simple, and elegant way of solving the problem.

CodePudding user response:

Ok, I think the solution I found for the problem is the best one out of everything that was suggested, so I am going to share it here:

First, we going to iterate through the rows of the original database using a "for" loop and the ".itertuples()" function. Within the loop, the data returned by the ".itertuples()" function is used to construct a new pandas database, which is then stored in a dictionary. The dictionary key to store each newly created database is derived from the first element returned by the ".itertuples" function.

import pandas as pd
from numpy.random import randn
 
df = pd.DataFrame(randn(3,3), columns=['column 1', 'column 2', 'column 3'], index = ['row 1', 'row 2', 'row 3'])

row = df.itertuples()

my_dict = {}

for row in df.itertuples():
    my_dict[row[0]] = pd.DataFrame([list(row)[1:]], columns=['column 1', 'column 2', 'column 3'],
                                   index = [row[0]])

print(my_dict)

Output:

{'row 1':        column 1  column 2  column 3
row 1  2.083922  1.513993  0.861644, 'row 2':        column 1  column 2  column 3
row 2  0.988185 -0.685701  0.252542, 'row 3':        column 1  column 2  column 3
row 3 -0.526314 -1.481147 -1.789547}

This is the most straightforward solution I was able to find. Any opinion on the above, please? (If there is a better solution, I will change the accepted answer.)

CodePudding user response:

You could use groupby for this, with globals which will set the new dataframe name. Something like this:

import pandas as pd
from numpy.random import randn
 
df = pd.DataFrame(randn(3,3), columns=['column 1', 'column 2', 'column 3'], index = ['row 1', 'row 2', 'row 3'])

count = 0
for (uniquerow), group in df.groupby(df.index):
    count =1
    globals()['df'   str(count)] = group

where df1-dfn has now been created for the n number of rows in the original dataframe.

  • Related