how to aggregate multiple tasks into a single python function?-CodePudding

I'm working on a dataframe that i have been able to clean by running the following codes in separate cells in jupyter notebook. However, I need to run these same tasks on several dataframes that are organized exactly the same. How can i write a function that can execute the tasks 2 through 4 below?

For reference, the date I'm working with is located here.

[1]: df1 = pd.read_csv('202110-divvy-tripdata.csv')

[2]: df1.drop(columns=['start_station_name','start_station_id','end_station_name','end_station_id','start_lat','start_lng','end_lat','end_lng'],inplace=True)

[3]: df1['ride_length'] = pd.to_datetime(df1.ended_at) - pd.to_datetime(df1.started_at)

[4]: df1['day_of_week'] = pd.to_datetime(df1.started_at).dt.day_name()

CodePudding user response：

You can define a function in a cell in Jupyter, run this cell and then call the function:

def process_df(df):
    df['ride_length'] = pd.to_datetime(df.ended_at) - pd.to_datetime(df.started_at)
    df['day_of_week'] = pd.to_datetime(df.started_at).dt.day_name()

Call the function with each DataFrame:

df1 = pd.read_csv('data1.csv')
df2 = pd.read_csv('data2.csv')

process_df(df1)
process_df(df2)

According to this answer, both DataFrames will be altered in place and there's no need to return a new object from the function.