Home > front end >  How to iterate through column x row combinations in a pandas dataframe
How to iterate through column x row combinations in a pandas dataframe

Time:12-09

I created a pandas dataframe, simply called df, that includes 3 rows and 3 columns. Each "row x column combination" in df contains a list of many numeric values, such as [-1.10037122 -1.12865588 -0.70395085 ... ].

I include an image of the dataframe so that it is easier to understand how the dataframe looks like:

enter image description here

My aim: Starting with the first column ple, I would like to iterate through a combination of the first value in each of the three rows. More precisely, I would like to assess the values -1.03065079 (interoception), -1.20001054 (exteroception), and -1.32780861 (cognitive).

Taking these three value, I would like to compute a linear regression across these three values. Then, I would like to repeat this step, but taking the second value in each of the three rows from the column ple, and so on. I am thus trying to setup a code that iterates through the dataframe df by extracting a value at the same position from or within each row.

df = pd.DataFrame(data=dict_data) # the dataframe
roi_range = np.array([0, 1, 2]) # three positions for the linear regression
rows = ["interoception", "exteroception", "cognitive"] # the row names

for i in df["ple"][columns]:
    print(i) # currently only prints the values of the first row
    slope, intercept = sp.stats.linregress(roi_range, i)

Currently the code fails when taking all values from the first row by trying to compute the linear regression against the three positions specified in roi_range. What is the cleanest way to iterate through all value combinations as explained above?

Of course, I would finally like to run the code for all three columns. To solve the problem step by step, I am first focusing only on the first column ple.

CodePudding user response:

You can use pd.Series.str.split to split the values by space and get the value by index.

df['ple'].str.split(" ").str[0]

0    -1.03065079
1    -1.20001054
2    -1.32780861

CodePudding user response:

This seems to do the trick:

import pandas as pd
import numpy as np
from scipy.stats import linregress

# toy data
df = pd.DataFrame([{"ple":[1,2,3,4,5]},{"ple":[3,4,5,6,7]},{"ple":[4,5,6,7,8]}],index=['intero','extero','cogn'])
# create a grouping variable (assumes all lists equal length)
df['frame'] = [np.arange(0,len(df.loc['intero','ple']))] * len(df)
# explode the lists into separate rows
df = df.explode(['ple','frame'])
# this is a bit of a hack -- I'm sure there's a cleaner way...
df2 = df.groupby('frame').apply(lambda x: x['ple'])
# do the linear regression on each row
df2['model'] = df2.apply(lambda x: linregress([0,1,2],x.values),axis=1)
display(df2)
  • Related