Home > Back-end >  pandas Optimizing many loops into one
pandas Optimizing many loops into one

Time:10-08

I have multiple dfs with same columns. Here is the list of all dfs

dfs = [df_jan, df_feb, df_mar, df_apr, df_may, df_jun, df_jul, df_aug, df_sep, df_oct, df_nov, df_dec]

Every dataframe looks like this for example,df_jan:

Id Days
001 0
004 56
013 95
015 33

Next, df_feb:

Id Days
001 0
023 18
459 19
811 35

The task is to append to lists of each month - the id of clients which has 30-60 days and another separate list with id of clients which has 90-120 days. It should be like this:

jan_30 = [] # the results is [ '001', '015']
jan_90 =[] # the results is [ '013']
feb_30 = [] # the results is [ '811']
feb_90 = [] # the results is []
---------------------
#i have defined 24 empty lists for 12 months
nov_90 =[] # the results is [ '013']
dec_30 = [] # the results is [ '811']
dec_90 = [] # the results is []

I have wrote 12 loops like this:

for row, x in enumerate(df_jan['Days']):
    if x in range(30, 61):
        jan_30.append(df_jan['Id'][row])
    elif x in range(90, 121):
        jan_90.append(df_jan['Id'][row])
    else:
        pass
------------------------------------------------------------
for row, x in enumerate(df_apr['Days']):
    if x in range(30, 61) and df_apr['Id'][row] not in set(chain(jan_30, feb_30, mar_30)):
        apr_30.append(df_apr['Id'][row])
    elif x in range(90, 121) and df_apr['Id'][row] not in set(chain(jan_90, feb_90, mar_90)):
        apr_90.append(df_apr['Id'][row])
    else:
        pass
------------------------------------------------------------

for row, x in enumerate(df_dec['Days']):
    if x in range(30, 61) and df_dec['Id'][row] not in set(chain(jan_30, feb_30, mar_30, apr_30, may_30, jun_30, jul_30, aug_30, sep_30, oct_30, nov_30)):
        dec_30.append(df_dec['Id'][row])
    elif x in range(90, 121) and df_dec['Id'][row] not in set(chain(jan_90, feb_90, mar_90, apr_90, may_90, jun_90, jul_90, aug_90, sep_90, oct_90, nov_90)):
        dec_90.append(df_dec['Id'][row])
    else:
        pass

How can I optimize these 12 loops into one? I got stuck on this. I try to use f'strings on it. Something like:

abrreviations = ['jan', 'feb','mar', 'apr', ... 'dec']
c = ['_30', '_90']
#Have wrote initializing loops like 
m_list
for a in abrreviations:
    for cp in c:
        m_list.append(a cp)
And the idea is using abbreviations in the loops with f'string or format. But don't know how to do it? Or can you offer another ideas? 

CodePudding user response:

#let first create a list containing all the dataframe's

all_df=[df_jan, df_feb, df_mar, df_apr, df_may, df_jun, df_jul, df_aug, df_sep, df_oct, df_nov, df_dec]

#create 2 lists for storing the id values of 30-60 range and 90-120 range

list_30,list_90=[],[]

#1 nested for loop for handling all data frames

for cur_df in all_df:
    for id,days in zip(cur_df['Id'],cur_df['Days']):
        if(30<=days<=60):
            list_30.append(id)
        elif(90<=days<=120):
            list_90.append(id)

#Now list_30 and list_90 contains the corresponding id values in that range

Hope the answer helps :)

CodePudding user response:

Since you didn't provide data I made a basic example and it worked for me so here is a single for-loop as you described:

import numpy as np
import pandas as pd
dfs = [df_jan, df_feb, df_mar, df_apr, df_may, df_jun, df_jul, df_aug, df_sep, df_oct, df_nov, df_dec]
df30 = []
df90 = []
dfsChained30 = []
dfsChained90 = []
for rowsForMonths, xForMonths in enumerate(dfs):
  # If January [don't consider chain];
  if rowsForMonths == 0:
    for dayN in range(df[rowsForMonths]):
      if df[rowsForMonths][dayN] in range(30, 61):
        df30.append(df[rowsForMonths][dayN])
      elif df[rowsForMonths][dayN] in range(90, 121):
        df90.append(df[rowsForMonths][dayN])
      else:
        pass
    dfsChained30.append(df30)
    dfsChained90.append(df90)
  # If not January [consider chain];
  else:
    for dayN in range(df[rowsForMonths]):
      if df[rowsForMonths][dayN] in range(30, 61) and df[rowsForMonths][dayN] not in set(dfsChained30):
        df30.append(df[rowsForMonths][dayN])
      elif df[rowsForMonths][dayN] in range(90, 121) and df[rowsForMonths][dayN] not in set(dfsChained90):
        df90.append(df[rowsForMonths][dayN])
      else:
        pass
    dfsChained30.append(df30)
    dfsChained90.append(df90)
  • Related