I'm working on a dataframe that i have been able to clean by running the following codes in separate cells in jupyter notebook. However, I need to run these same tasks on several dataframes that are organized exactly the same. How can i write a function that can execute the tasks 2 through 4 below?
For reference, the date I'm working with is located here.
[1]: df1 = pd.read_csv('202110-divvy-tripdata.csv')
[2]: df1.drop(columns=['start_station_name','start_station_id','end_station_name','end_station_id','start_lat','start_lng','end_lat','end_lng'],inplace=True)
[3]: df1['ride_length'] = pd.to_datetime(df1.ended_at) - pd.to_datetime(df1.started_at)
[4]: df1['day_of_week'] = pd.to_datetime(df1.started_at).dt.day_name()
CodePudding user response:
You can define a function in a cell in Jupyter
, run this cell and then call the function:
def process_df(df):
df['ride_length'] = pd.to_datetime(df.ended_at) - pd.to_datetime(df.started_at)
df['day_of_week'] = pd.to_datetime(df.started_at).dt.day_name()
Call the function with each DataFrame
:
df1 = pd.read_csv('data1.csv')
df2 = pd.read_csv('data2.csv')
process_df(df1)
process_df(df2)
According to this answer, both DataFrame
s will be altered in place and there's no need to return a new object from the function.