Attribute change with variable number of time steps-CodePudding

I would like to simulate individual changes in growth and mortality for a variable number of days. My dataframe is formatted as follows...

    import pandas as pd

    data = {'unique_id':  ['2', '4', '5', '13'],
            'length': ['27.7', '30.2', '25.4', '29.1'],
            'no_fish': ['3195', '1894', '8', '2774'],
            'days_left': ['253', '253', '254', '256'],
            'growth': ['0.3898', '0.3414', '0.4080', '0.3839']
           }

    df = pd.DataFrame(data)

    print(df)

      unique_id length no_fish days_left  growth
    0         2   27.7    3195       253  0.3898
    1         4   30.2    1894       253  0.3414
    2         5   25.4       8       254  0.4080
    3        13   29.1    2774       256  0.3839

Ideally, I would like the initial length (i.e., length) to increase by the daily growth rate (i.e., growth) for each of the days remaining in the year (i.e., days_left).

    df['final'] = df['length']   (df['days_left'] * df['growth']

However, I would also like to update the number of fish that each individual represents (i.e., no_fish) on a daily basis using a size-specific equation. I'm fairly new to python so I initially thought to use a for-loop (I'm not sure if there is another, more efficient way). My code is as follows:

# keep track of run time - START
start_time = time.perf_counter()

df['z'] = 0.0
for indx in range(len(df)): 
    count = 1
    while count <= int(df.days_to_forecast[indx]):
   
        # (1) update individual length
        df.lgth[indx] = df.lgth[indx]   df.linearGR[indx]
    
        # (2) estimate daily size-specific mortality 
        if df.lgth[indx] > 50.0:
            df.z[indx] = 0.01
        else:
            if df.lgth[indx] <= 50.0:
                df.z[indx] = 0.052857-((0.03/35)*df.lgth[indx])
            elif df.lgth[indx] < 15.0:
                df.z[indx] = 0.728*math.exp(-0.1892*df.lgth[indx])
    
        df['no_fish'].round(decimals = 0)
        if df.no_fish[indx] < 1.0:
            df.no_fish[indx] = 0.0
        elif df.no_fish[indx] >= 1.0:
            df.no_fish[indx] = df.no_fish[indx]*math.exp(-(df.z[indx]))
    
        # (3) reduce no. of days left in forecast by 1
        count = count   1

# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))

The above code now works correctly, but it is still far to inefficient to run for 40,000 individuals each for 200 days.

I would really appreciate any advice on how to modify the following code to make it pythonic.

Thanks

CodePudding user response：

Another option that was suggested to me is to use the pd.dataframe.apply function. This dramatically reduced the overall the run time and could be useful to someone else in the future.

### === RUN SIMULATION === ###
start_time = time.perf_counter()          # keep track of run time -- START

#-------------------------------------------------------------------------#
def function_to_apply( df ):

    df['z_instantMort'] = ''           
 
    for indx in range(int(df['days_left'])):
    
        # (1) update individual length
        df['length'] = df['length']   df['growth']
      
        # (2) estimate daily size-specific mortality 
        if df['length'] > 50.0:
            df['z_instantMort'] = 0.01
        else:
            if df['length'] <= 50.0:
                df['z_instantMort'] = 0.052857-((0.03/35)*df['length'])
            elif df['length'] < 15.0:
                df['z_instantMort'] = 0.728*np.exp(-0.1892*df['length'])
  
        whole_fish = round(df['no_fish'], 0)
        if whole_fish < 1.0:
            df['no_fish'] = 0.0
        elif whole_fish >= 1.0:
            df['no_fish'] = df['no_fish']*np.exp(-(df['z_instantMort']))
          
return df
#-------------------------------------------------------------------------#

sim_results = df.apply(function_to_apply, axis=1)

total_elapsed_time = round(time.perf_counter() - start_time, 2)       # END
print("Forecast iteration completed in {} seconds".format(total_elapsed_time)) 
print(sim_results)
### ====================== ###

output being...

Forecast iteration completed in 0.05 seconds
   unique_id    length     no_fish  days_left  growth  z_instantMort
0        2.0  126.3194  148.729190      253.0  0.3898           0.01
1        4.0  116.5742   93.018465      253.0  0.3414           0.01
2        5.0  129.0320    0.000000      254.0  0.4080           0.01
3       13.0  127.3784  132.864757      256.0  0.3839           0.01

CodePudding user response：

As I said in my comment, a preferable alternative to for loops in this setting is using vector operations. For instance, running your code:

import pandas as pd
import time
import math
import numpy as np

data = {'unique_id':  [2, 4, 5, 13],
        'length': [27.7, 30.2, 25.4, 29.1],
        'no_fish': [3195, 1894, 8, 2774],
        'days_left': [253, 253, 254, 256],
        'growth': [0.3898, 0.3414, 0.4080, 0.3839]
       }

df = pd.DataFrame(data)

print(df)

# keep track of run time - START
start_time = time.perf_counter()

df['z'] = 0.0
for indx in range(len(df)): 
    count = 1
    while count <= int(df.days_left[indx]):
   
        # (1) update individual length
        df.length[indx] = df.length[indx]   df.growth[indx]
    
        # (2) estimate daily size-specific mortality 
        if df.length[indx] > 50.0:
            df.z[indx] = 0.01
        else:
            if df.length[indx] <= 50.0:
                df.z[indx] = 0.052857-((0.03/35)*df.length[indx])
            elif df.length[indx] < 15.0:
                df.z[indx] = 0.728*math.exp(-0.1892*df.length[indx])
    
        df['no_fish'].round(decimals = 0)
        if df.no_fish[indx] < 1.0:
            df.no_fish[indx] = 0.0
        elif df.no_fish[indx] >= 1.0:
            df.no_fish[indx] = df.no_fish[indx]*math.exp(-(df.z[indx]))
    
        # (3) reduce no. of days left in forecast by 1
        count = count   1

# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
print(df)

with output:

   unique_id  length  no_fish  days_left  growth
0          2    27.7     3195        253  0.3898
1          4    30.2     1894        253  0.3414
2          5    25.4        8        254  0.4080
3         13    29.1     2774        256  0.3839
Forecast iteration completed in 31.75 seconds
   unique_id    length     no_fish  days_left  growth     z
0          2  126.3194  148.729190        253  0.3898  0.01
1          4  116.5742   93.018465        253  0.3414  0.01
2          5  129.0320    0.000000        254  0.4080  0.01
3         13  127.3784  132.864757        256  0.3839  0.01

Now with vector operations, you could do something like:

# keep track of run time - START
start_time = time.perf_counter()
df['z'] = 0.0
for day in range(1, df.days_left.max()   1):
    update = day <= df['days_left']
    # (1) update individual length
    df[update]['length'] = df[update]['length']   df[update]['growth']

    # (2) estimate daily size-specific mortality
    df[update]['z'] = np.where( df[update]['length'] > 50.0, 0.01, 0.052857-( ( 0.03 / 35)*df[update]['length'] ) )
    df[update]['z'] = np.where( df[update]['length'] < 15.0, 0.728 * np.exp(-0.1892*df[update]['length'] ), df[update]['z'] )
                        
                    
                            
    df[update]['no_fish'].round(decimals = 0)
    df[update]['no_fish'] = np.where(df[update]['no_fish'] < 1.0, 0.0, df[update]['no_fish'] * np.exp(-(df[update]['z'])))                          
# keep track of run time - END
total_elapsed_time = round(time.perf_counter() - start_time, 2)
print("Forecast iteration completed in {} seconds".format(total_elapsed_time))
print(df)

with output

Forecast iteration completed in 1.32 seconds
   unique_id    length     no_fish  days_left  growth    z
0          2  126.3194  148.729190        253  0.3898  0.0
1          4  116.5742   93.018465        253  0.3414  0.0
2          5  129.0320    0.000000        254  0.4080  0.0
3         13  127.3784  132.864757        256  0.3839  0.0