I need to iterate through each dataset in the dataframe based on multiple indexes ('Treatment', 'individual', 'regime'). I want to apply curve fit using x and y for each Treatment, individual and regime. Currently I am able to use only one index.
This is the dataframe
df_tot
Treatment y x individual regime
0 White 21.982733 800 Data20210608 Ctrl
1 White 21.973003 800 Data20210508 Ctrl
2 White 21.968242 800 Data20210408 Ctrl
3 White 21.982733 600 Data20210608 Ctrl
4 White 21.973003 600 Data20210508 Ctrl
5 White 21.968242 600 Data20210408 Ctrl
6 White 21.982733 500 Data20210608 Ctrl
7 White 21.973003 500 Data20210508 Ctrl
5 White 21.968242 500 Data20210408 Ctrl
15 White_FR 22.139293 800 Data20210608 Ctrl
16 White_FR 22.159840 800 Data20210508 Ctrl
17 White_FR 22.162254 800 Data20210408 Ctrl
18 White_FR 22.139293 600 Data20210608 Ctrl
19 White_FR 22.159840 600 Data20210508 Ctrl
20 White_FR 22.162254 600 Data20210408 Ctrl
21 White_FR 22.139293 500 Data20210608 Ctrl
22 White_FR 22.159840 500 Data20210508 Ctrl
23 White_FR 22.162254 500 Data20210408 Ctrl
2500 White 1.864671 800 Data20210708 T
2501 White 1.871709 800 Data20210608 T
2502 White 1.884706 800 Data20210508 T
2503 White 1.872854 600 Data20210708 T
2504 White 1.872233 600 Data20210608 T
2505 White 1.872344 600 Data20210508 T
2506 White 1.872854 500 Data20210708 T
2507 White 1.872233 500 Data20210608 T
2508 White 1.872344 500 Data20210508 T
2519 White_FR 1.882861 800 Data20210708 T
2520 White_FR 1.917002 800 Data20210608 T
2521 White_FR 1.903067 800 Data20210508 T
2519 White_FR 1.882861 600 Data20210708 T
2520 White_FR 1.917002 600 Data20210608 T
2521 White_FR 1.903067 600 Data20210508 T
2519 White_FR 1.882861 500 Data20210708 T
2520 White_FR 1.917002 500 Data20210608 T
2521 White_FR 1.903067 500 Data20210508 T
This is the code:
variables={'Spectrum':Spectrum, date':date, 'regime':regime,
'slope':float}
results = pd.DataFrame(variables, index=[])
group_df = df_tot.groupby(["Spectrum", "date", "regime", "PPFD",
"start"])
def model(x, slope):
return (slope*x) start
group_df.apply(lambda x : curve_fit(model, x.loc[:, 'PPFD'],
x.loc[:, 'Photo']))
new_row = {'Spectrum': Spectrum, date':date, 'regime':regime, 'slope':
popt[0]} ## adding Spectrum gives an error
#name 'Spectrum' is not defined
results=results.append(new_row, ignore_index=True)
Now I get
results
date regime slope
0 Data20210608 Ctrl 0.05
CodePudding user response:
You can absolutely iterate through a dataframe with more than 1 index.
First of all, there are some major issues with your code :
- Add some toy data with your problem, so we can play with it to find a solution to the problem you're facing (and not an output of your data)
- Don't ever use
del
to delete some columns in a dataframe, use drop or select all but one usingloc
oriloc
. - Don't write
all= [df_Ctrl, df_FR]
,all
has a specified meaning in python, you should pick an other name. for g in all: #if I put for key, g in all
,all
here a list of two elements, there is nothing to unpack here- Your dataframe is not multiindexed, you have to modify it if you want so.
- I strongly encourage you to not use
[[]]
to select a sub dataframe of a dataframe, but using loc or iloc instead.
If I understand your problem correctly, you want to group elements of your dataframe depending of three data : 'Treatment', 'individual', 'regime', then for each grouped values, you want to perform a specified operation on x and y. You can adapt for this :
group_df = df_tot.groupby(["Treatment", "individual", "regime"])
curved_df = group_df.apply(lambda x : curve_fit(model, x.loc[:, 'x'], x.loc[:, 'y']))
Obviously since you didn't provide model nor curve_fit, I can't test if it's correct or not. But the main idea is here and you can work from it.