I created a pandas dataframe, simply called df
, that includes 3 rows and 3 columns. Each "row x column combination" in df
contains a list of many numeric values, such as [-1.10037122 -1.12865588 -0.70395085 ... ]
.
I include an image of the dataframe so that it is easier to understand how the dataframe looks like:
My aim: Starting with the first column ple
, I would like to iterate through a combination of the first value in each of the three rows. More precisely, I would like to assess the values -1.03065079
(interoception), -1.20001054
(exteroception), and -1.32780861
(cognitive).
Taking these three value, I would like to compute a linear regression across these three values. Then, I would like to repeat this step, but taking the second value in each of the three rows from the column ple
, and so on. I am thus trying to setup a code that iterates through the dataframe df
by extracting a value at the same position from or within each row.
df = pd.DataFrame(data=dict_data) # the dataframe
roi_range = np.array([0, 1, 2]) # three positions for the linear regression
rows = ["interoception", "exteroception", "cognitive"] # the row names
for i in df["ple"][columns]:
print(i) # currently only prints the values of the first row
slope, intercept = sp.stats.linregress(roi_range, i)
Currently the code fails when taking all values from the first row by trying to compute the linear regression against the three positions specified in roi_range
. What is the cleanest way to iterate through all value combinations as explained above?
Of course, I would finally like to run the code for all three columns. To solve the problem step by step, I am first focusing only on the first column ple
.
CodePudding user response:
You can use pd.Series.str.split
to split the values by space and get the value by index.
df['ple'].str.split(" ").str[0]
0 -1.03065079
1 -1.20001054
2 -1.32780861
CodePudding user response:
This seems to do the trick:
import pandas as pd
import numpy as np
from scipy.stats import linregress
# toy data
df = pd.DataFrame([{"ple":[1,2,3,4,5]},{"ple":[3,4,5,6,7]},{"ple":[4,5,6,7,8]}],index=['intero','extero','cogn'])
# create a grouping variable (assumes all lists equal length)
df['frame'] = [np.arange(0,len(df.loc['intero','ple']))] * len(df)
# explode the lists into separate rows
df = df.explode(['ple','frame'])
# this is a bit of a hack -- I'm sure there's a cleaner way...
df2 = df.groupby('frame').apply(lambda x: x['ple'])
# do the linear regression on each row
df2['model'] = df2.apply(lambda x: linregress([0,1,2],x.values),axis=1)
display(df2)