I have multiple dfs with same columns. Here is the list of all dfs
dfs = [df_jan, df_feb, df_mar, df_apr, df_may, df_jun, df_jul, df_aug, df_sep, df_oct, df_nov, df_dec]
Every dataframe looks like this for example,df_jan:
Id | Days |
---|---|
001 | 0 |
004 | 56 |
013 | 95 |
015 | 33 |
Next, df_feb:
Id | Days |
---|---|
001 | 0 |
023 | 18 |
459 | 19 |
811 | 35 |
The task is to append to lists of each month - the id of clients which has 30-60 days and another separate list with id of clients which has 90-120 days. It should be like this:
jan_30 = [] # the results is [ '001', '015']
jan_90 =[] # the results is [ '013']
feb_30 = [] # the results is [ '811']
feb_90 = [] # the results is []
---------------------
#i have defined 24 empty lists for 12 months
nov_90 =[] # the results is [ '013']
dec_30 = [] # the results is [ '811']
dec_90 = [] # the results is []
I have wrote 12 loops like this:
for row, x in enumerate(df_jan['Days']):
if x in range(30, 61):
jan_30.append(df_jan['Id'][row])
elif x in range(90, 121):
jan_90.append(df_jan['Id'][row])
else:
pass
------------------------------------------------------------
for row, x in enumerate(df_apr['Days']):
if x in range(30, 61) and df_apr['Id'][row] not in set(chain(jan_30, feb_30, mar_30)):
apr_30.append(df_apr['Id'][row])
elif x in range(90, 121) and df_apr['Id'][row] not in set(chain(jan_90, feb_90, mar_90)):
apr_90.append(df_apr['Id'][row])
else:
pass
------------------------------------------------------------
for row, x in enumerate(df_dec['Days']):
if x in range(30, 61) and df_dec['Id'][row] not in set(chain(jan_30, feb_30, mar_30, apr_30, may_30, jun_30, jul_30, aug_30, sep_30, oct_30, nov_30)):
dec_30.append(df_dec['Id'][row])
elif x in range(90, 121) and df_dec['Id'][row] not in set(chain(jan_90, feb_90, mar_90, apr_90, may_90, jun_90, jul_90, aug_90, sep_90, oct_90, nov_90)):
dec_90.append(df_dec['Id'][row])
else:
pass
How can I optimize these 12 loops into one? I got stuck on this. I try to use f'strings on it. Something like:
abrreviations = ['jan', 'feb','mar', 'apr', ... 'dec']
c = ['_30', '_90']
#Have wrote initializing loops like
m_list
for a in abrreviations:
for cp in c:
m_list.append(a cp)
And the idea is using abbreviations in the loops with f'string or format. But don't know how to do it? Or can you offer another ideas?
CodePudding user response:
#let first create a list containing all the dataframe's
all_df=[df_jan, df_feb, df_mar, df_apr, df_may, df_jun, df_jul, df_aug, df_sep, df_oct, df_nov, df_dec]
#create 2 lists for storing the id values of 30-60 range and 90-120 range
list_30,list_90=[],[]
#1 nested for loop for handling all data frames
for cur_df in all_df:
for id,days in zip(cur_df['Id'],cur_df['Days']):
if(30<=days<=60):
list_30.append(id)
elif(90<=days<=120):
list_90.append(id)
#Now list_30 and list_90 contains the corresponding id values in that range
Hope the answer helps :)
CodePudding user response:
Since you didn't provide data I made a basic example and it worked for me so here is a single for-loop as you described:
import numpy as np
import pandas as pd
dfs = [df_jan, df_feb, df_mar, df_apr, df_may, df_jun, df_jul, df_aug, df_sep, df_oct, df_nov, df_dec]
df30 = []
df90 = []
dfsChained30 = []
dfsChained90 = []
for rowsForMonths, xForMonths in enumerate(dfs):
# If January [don't consider chain];
if rowsForMonths == 0:
for dayN in range(df[rowsForMonths]):
if df[rowsForMonths][dayN] in range(30, 61):
df30.append(df[rowsForMonths][dayN])
elif df[rowsForMonths][dayN] in range(90, 121):
df90.append(df[rowsForMonths][dayN])
else:
pass
dfsChained30.append(df30)
dfsChained90.append(df90)
# If not January [consider chain];
else:
for dayN in range(df[rowsForMonths]):
if df[rowsForMonths][dayN] in range(30, 61) and df[rowsForMonths][dayN] not in set(dfsChained30):
df30.append(df[rowsForMonths][dayN])
elif df[rowsForMonths][dayN] in range(90, 121) and df[rowsForMonths][dayN] not in set(dfsChained90):
df90.append(df[rowsForMonths][dayN])
else:
pass
dfsChained30.append(df30)
dfsChained90.append(df90)