Home > Blockchain >  Import and format data inputs with for loop
Import and format data inputs with for loop

Time:10-21

I need to import 3 .xlsx files and merge them into 1 DataFrame. I would like to avoid repeating the code by using for loop.

Original code:

filepath_1 = input('Enter Revenue Month M1 File Path: ')
revenue_month_1 = pd.read_excel(filepath_1)
revenue_month_1 = revenue_month_1.apply(pd.to_numeric, errors='ignore')
revenue_month_1['Month'] = pd.to_datetime(revenue_month_1['Month'], format='%Y%m', errors='coerce').dropna()

filepath_2 = input('Enter Revenue Month M2 File Path: ')
revenue_month_2 = pd.read_excel(filepath_2)
revenue_month_2 = revenue_month_2.apply(pd.to_numeric, errors='ignore')
revenue_month_2['Month'] = pd.to_datetime(revenue_month_2['Month'], format='%Y%m', errors='coerce').dropna()

filepath_3 = input('Enter Revenue Month M3 File Path: ')
revenue_month_3 = pd.read_excel(filepath_3)
revenue_month_3 = revenue_month_3.apply(pd.to_numeric, errors='ignore')
revenue_month_3['Month'] = pd.to_datetime(revenue_month_3['Month'], format='%Y%m', errors='coerce').dropna()

Code with for loop:

revenue_reports = [
    input('Enter Revenue Month M1 File Path: '),
    input('Enter Revenue Month M2 File Path: '),
    input('Enter Revenue Month M3 File Path: '),
    ]

revenue = []

for revenue_report in revenue_reports:
    revenue = pd.read_excel(revenue_report)
    revenue = revenue.apply(pd.to_numeric, errors='ignore')
    revenue['Month'] = pd.to_datetime(revenue['Month'], format='%Y%m', errors='coerce').dropna()
    revenue = revenue.append(revenue)

Based on this for loop, I get only last month data (M3) from 3 months data imported. Could you help please?

Update: Problem was solved. Thanks for the idea from comments below. I've modified a bit. It looks like this:

revenue_reports = [
    input('Enter Revenue Month M1 File Path: '),
    input('Enter Revenue Month M2 File Path: '),
    input('Enter Revenue Month M3 File Path: '),
]

revenue = []

x = 1
for revenue_report in revenue_reports:
    revenue_monthly = pd.read_excel(revenue_report)
    revenue_monthly = revenue_monthly.apply(pd.to_numeric, errors='ignore')
    revenue_monthly["M" str(x)] = pd.to_datetime(revenue_monthly['Month'], format='%Y%m', errors='coerce').dropna()
    x  = 1
    revenue.append(revenue_monthly)
    
revenue = pd.concat(revenue)

CodePudding user response:

Your code was fine but the problem was that you were replacing the new value with the previous one that's why you only get the last value of M3
Try this one

revenue_reports = [
    input('Enter Revenue Month M1 File Path: '),
    input('Enter Revenue Month M2 File Path: '),
    input('Enter Revenue Month M3 File Path: '),
]

revenue = {}
x = 1
for revenue_report in revenue_reports:
    revenue_val = pd.read_excel(revenue_report)
    revenue_val = revenue_val.apply(pd.to_numeric, errors='ignore')
    revenue["M" x] = pd.to_datetime(revenue_val['Month'], format='%Y%m', errors='coerce').dropna()
    x  = 1

CodePudding user response:

May only use a new variable to store All revenue data.

revenue_reports = [
    input('Enter Revenue Month M1 File Path: '),
    input('Enter Revenue Month M2 File Path: '),
    input('Enter Revenue Month M3 File Path: '),
    ]

All_revenue =[]

revenue = []

for revenue_report in revenue_reports:
    revenue = pd.read_excel(revenue_report)
    revenue = revenue.apply(pd.to_numeric, errors='ignore')
    revenue['Month'] = pd.to_datetime(revenue['Month'], format='%Y%m', errors='coerce').dropna()
    All_revenue = All_revenue.append(revenue)
  • Related