Home > other >  How can I iterate in dataframe and get output for each group? Now I get only one line and one group
How can I iterate in dataframe and get output for each group? Now I get only one line and one group

Time:03-17

I need to iterate through each dataset in the dataframe based on multiple indexes ('Treatment', 'individual', 'regime'). I want to apply curve fit using x and y for each Treatment, individual and regime. Currently I am able to use only one index.

This is the dataframe

df_tot

       Treatment        y        x      individual   regime
0       White       21.982733   800   Data20210608  Ctrl
1       White       21.973003   800   Data20210508  Ctrl
2       White       21.968242   800   Data20210408  Ctrl
3       White       21.982733   600   Data20210608  Ctrl
4       White       21.973003   600   Data20210508  Ctrl
5       White       21.968242   600   Data20210408  Ctrl
6       White       21.982733   500   Data20210608  Ctrl
7       White       21.973003   500   Data20210508  Ctrl
5       White       21.968242   500   Data20210408  Ctrl
15      White_FR    22.139293   800   Data20210608  Ctrl
16      White_FR    22.159840   800   Data20210508  Ctrl
17      White_FR    22.162254   800   Data20210408  Ctrl
18      White_FR    22.139293   600   Data20210608  Ctrl
19      White_FR    22.159840   600   Data20210508  Ctrl
20      White_FR    22.162254   600   Data20210408  Ctrl
21      White_FR    22.139293   500   Data20210608  Ctrl
22      White_FR    22.159840   500   Data20210508  Ctrl
23      White_FR    22.162254   500   Data20210408  Ctrl
2500    White       1.864671    800   Data20210708  T
2501    White       1.871709    800   Data20210608  T
2502    White       1.884706    800   Data20210508  T
2503    White       1.872854    600   Data20210708  T
2504    White       1.872233    600   Data20210608  T
2505    White       1.872344    600   Data20210508  T
2506    White       1.872854    500   Data20210708  T
2507    White       1.872233    500   Data20210608  T
2508    White       1.872344    500   Data20210508  T
2519    White_FR    1.882861    800 Data20210708    T
2520    White_FR    1.917002    800 Data20210608    T
2521    White_FR    1.903067    800 Data20210508    T
2519    White_FR    1.882861    600 Data20210708    T
2520    White_FR    1.917002    600 Data20210608    T
2521    White_FR    1.903067    600 Data20210508    T
2519    White_FR    1.882861    500 Data20210708    T
2520    White_FR    1.917002    500 Data20210608    T
2521    White_FR    1.903067    500 Data20210508    T

This is the code:

 variables={'Spectrum':Spectrum,  date':date, 'regime':regime, 
             'slope':float} 
 results = pd.DataFrame(variables, index=[])


 group_df = df_tot.groupby(["Spectrum", "date", "regime", "PPFD", 
              "start"])

 def model(x, slope):
    return  (slope*x)   start


 group_df.apply(lambda x : curve_fit(model, x.loc[:, 'PPFD'], 
                x.loc[:, 'Photo']))

 new_row = {'Spectrum': Spectrum, date':date, 'regime':regime, 'slope': 
             popt[0]}  ## adding Spectrum gives an error
                        #name 'Spectrum' is not defined
 results=results.append(new_row, ignore_index=True)

Now I get

 results
        date       regime  slope
 0    Data20210608 Ctrl 0.05

CodePudding user response:

You can absolutely iterate through a dataframe with more than 1 index.

First of all, there are some major issues with your code :

  1. Add some toy data with your problem, so we can play with it to find a solution to the problem you're facing (and not an output of your data)
  2. Don't ever use del to delete some columns in a dataframe, use drop or select all but one using loc or iloc.
  3. Don't write all= [df_Ctrl, df_FR], all has a specified meaning in python, you should pick an other name.
  4. for g in all: #if I put for key, g in all, all here a list of two elements, there is nothing to unpack here
  5. Your dataframe is not multiindexed, you have to modify it if you want so.
  6. I strongly encourage you to not use [[]]to select a sub dataframe of a dataframe, but using loc or iloc instead.

If I understand your problem correctly, you want to group elements of your dataframe depending of three data : 'Treatment', 'individual', 'regime', then for each grouped values, you want to perform a specified operation on x and y. You can adapt for this :

group_df = df_tot.groupby(["Treatment", "individual", "regime"])
curved_df = group_df.apply(lambda x : curve_fit(model, x.loc[:, 'x'], x.loc[:, 'y']))

Obviously since you didn't provide model nor curve_fit, I can't test if it's correct or not. But the main idea is here and you can work from it.

  • Related