Date | Train Number | Station 1 | Station 2 | Equipment Available? |
---|---|---|---|---|
2022-06-16 | 1111 | North | Central | Y |
2022-06-20 | 1111 | North | Central | Y |
2022-06-01 | 2222 | North | South | Y |
2022-06-02 | 2222 | North | South | Y |
2022-06-03 | 2222 | North | South | Y |
2022-06-04 | 2222 | North | South | Y |
2022-06-05 | 2222 | North | South | Y |
2022-06-06 | 2222 | North | South | Y |
2022-06-07 | 2222 | North | South | Y |
2022-06-08 | 2222 | North | South | Y |
I have a Pandas dataframe that looks like the one above that is sorted by Train Number and then Date. I would like to print a simple Python statement that says:
"For Train Number 1111 North to Central, we have equipment available on June 16th and June 20th.
For Train Number 2222 North to South, we have equipment available from June 1st to June 8th."
How am I able to do this?????
CodePudding user response:
here is one way to do it. Group by Train, station1, station2, taking both min and max of the dates
Finally printing them out from the resulting df from groupby
df2=df.groupby(['TrainNumber', 'Station1', 'Station2'])['Date'].aggregate([min, max]).reset_index()
for idx, row in df2.iterrows():
print(f"For Train Number {0} {1} to {2}, we have equipment available on {3} and {4}",row[0],row[1],row[2],row[3],row[4] )
For Train Number 0 1 to 2, we have equipment available on 3 and 4 1111 North Central 2022-06-16 2022-06-20
For Train Number 0 1 to 2, we have equipment available on 3 and 4 2222 North South 2022-06-01 2022-06-08
CodePudding user response:
You could get the min
and max
values for each Train's Date
with a groupby
, dedupe the DataFrame
to get the other columns (as they are repeated) and then print the results with some datetime formatting
df.loc[:, 'Date'] = pd.to_datetime(df['Date'])
g = df.groupby(['Train Number']).agg(date_min=pd.NamedAgg(column='Date', aggfunc='min'), date_max=pd.NamedAgg(column='Date', aggfunc='max'))
g = g.join(df_deduped, how='inner')
df_deduped = df.loc[:, 'Train Number':].drop_duplicates().set_index('Train Number')
for index, values in g.reset_index().iterrows():
print(f'For Train Number {values["Train Number"]}, {values["Station 1"]} to {values["Station 2"]}, we have equipment available from {values["date_min"].strftime("%b %d")} to {values["date_max"].strftime("%b %d")}')
The output is -
For Train Number 1111, North to Central, we have equipment available from Jun 16 to Jun 20
For Train Number 2222, North to South, we have equipment available from Jun 01 to Jun 08
CodePudding user response:
I've made a little function which you can call on whatever df you want.
I find this solution more readable and flexible for further requests.
def equip_avail(df):
for i in df['Train Number'].unique():
date_start = df.Date.loc[(df['Train Number']==i)].min()
date_end = df.Date.loc[(df['Train Number']==i)].max()
from_start = df.Station1.loc[(df['Train Number']==i)].values[0]
to_end = df.Station2.loc[(df['Train Number']==i)].values[0]
print(f'For Train Number {i} {from_start} to {to_end}, we have equipment available from {date_start} to {date_end}.')
Then you call it like this:
equip_avail(df)
Result:
For Train Number 1111 North to Central, we have equipment available from 2022-06-16 to 2022-06-20.
For Train Number 2222 North to South, we have equipment available from 2022-06-01 to 2022-06-08.