I'm doing a study on some flight data. it is supposed to be an explanatory analysis where some statistical methods like binning should be used. I'm stuck trying to format Departure and arrival time. So here is my code so far:
#Calling Libraries
import os # File management
import pandas as pd # Data frame manipulation
import numpy as np # Data frame operations
import datetime as dt # Date operations
import seaborn as sns # Data Viz
#Reading the file:
flight_df=pd.read_csv(r'C:\Users\pc\Desktop\Work\flights.csv')
#Checking the DataFrame:
flight_df.head()
flight_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2500 entries, 0 to 2499
Data columns (total 38 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 O_AIRPORT_IATA_CODE 2500 non-null object
1 O_AIRPORT 2288 non-null object
2 O_CITY 2288 non-null object
3 O_STATE 2288 non-null object
4 O_COUNTRY 2288 non-null object
5 O_LATITUDE 2287 non-null float64
6 O_LONGITUDE 2287 non-null float64
7 D_AIRPORT_IATA_CODE 2500 non-null object
8 D_AIRPORT 2288 non-null object
9 D_CITY 2288 non-null object
10 D_STATE 2288 non-null object
11 D_COUNTRY 2288 non-null object
12 D_LATITUDE 2288 non-null float64
13 D_LONGITUDE 2288 non-null float64
14 SCHEDULED_DEPARTURE 2500 non-null int64
15 DEPARTURE_TIME 2467 non-null float64
16 DEPARTURE_DELAY 2467 non-null float64
17 TAXI_OUT 2467 non-null float64
18 WHEELS_OFF 2467 non-null float64
19 SCHEDULED_TIME 2500 non-null int64
20 ELAPSED_TIME 2464 non-null float64
21 AIR_TIME 2464 non-null float64
22 DISTANCE 2500 non-null int64
23 WHEELS_ON 2467 non-null float64
24 TAXI_IN 2467 non-null float64
25 SCHEDULED_ARRIVAL 2500 non-null int64
26 ARRIVAL_TIME 2467 non-null float64
27 ARRIVAL_DELAY 2464 non-null float64
28 DIVERTED 2500 non-null int64
29 CANCELLED 2500 non-null int64
30 CANCELLATION_REASON 33 non-null object
31 AIR_SYSTEM_DELAY 386 non-null float64
32 SECURITY_DELAY 386 non-null float64
33 AIRLINE_DELAY 386 non-null float64
34 LATE_AIRCRAFT_DELAY 386 non-null float64
35 WEATHER_DELAY 386 non-null float64
36 DATE 2500 non-null object
37 AIRLINE_NAME 2500 non-null object
dtypes: float64(19), int64(6), object(13)
memory usage: 742.3 KB
# dropping redundant columns
newdf= flight_df.drop(['O_COUNTRY','O_LATITUDE','O_LONGITUDE','D_COUNTRY','D_LATITUDE','D_LONGITUDE','SCHEDULED_DEPARTURE','DIVERTED','CANCELLED','CANCELLATION_REASON','TAXI_OUT','TAXI_IN','WHEELS_OFF', 'WHEELS_ON','SCHEDULED_ARRIVAL'],axis=1, inplace = True)
I need to change departure and arrival time format so instead of appearing like this:
12 1746.0
14 1849.0
19 1514.0
20 1555.0
22 2017.0
Name: DEPARTURE_TIME, dtype: float64
they appear like this:
12 17:46
14 18:49
19 15:14
20 15:55
22 20:17
I need this to be able to do further binning and analysis
Thanks!
CodePudding user response:
you can obtain the desired format by using pd.to_datetime to parse to datetime data type, then format to string:
import pandas as pd
df = pd.DataFrame({'DEPARTURE_TIME': [1746.0, 1849.0, 1514.0, 1555.0, 2017.0]})
df['DEPARTURE_TIME'] = pd.to_datetime(df['DEPARTURE_TIME'], format="%H%M").dt.strftime("%H:%M")
df['DEPARTURE_TIME']
0 17:46
1 18:49
2 15:14
3 15:55
4 20:17
Name: DEPARTURE_TIME, dtype: object