How do I print a simple Python statement based on Pandas dataframe?-CodePudding

Date	Train Number	Station 1	Station 2	Equipment Available?
2022-06-16	1111	North	Central	Y
2022-06-20	1111	North	Central	Y
2022-06-01	2222	North	South	Y
2022-06-02	2222	North	South	Y
2022-06-03	2222	North	South	Y
2022-06-04	2222	North	South	Y
2022-06-05	2222	North	South	Y
2022-06-06	2222	North	South	Y
2022-06-07	2222	North	South	Y
2022-06-08	2222	North	South	Y

I have a Pandas dataframe that looks like the one above that is sorted by Train Number and then Date. I would like to print a simple Python statement that says:

"For Train Number 1111 North to Central, we have equipment available on June 16th and June 20th.

For Train Number 2222 North to South, we have equipment available from June 1st to June 8th."

How am I able to do this?????

CodePudding user response：

here is one way to do it. Group by Train, station1, station2, taking both min and max of the dates

Finally printing them out from the resulting df from groupby

df2=df.groupby(['TrainNumber', 'Station1', 'Station2'])['Date'].aggregate([min, max]).reset_index()
for idx, row in df2.iterrows():
    print(f"For Train Number {0} {1} to {2}, we have equipment available on {3} and {4}",row[0],row[1],row[2],row[3],row[4] )

For Train Number 0 1 to 2, we have equipment available on 3 and 4 1111 North Central 2022-06-16 2022-06-20
For Train Number 0 1 to 2, we have equipment available on 3 and 4 2222 North South 2022-06-01 2022-06-08

CodePudding user response：

You could get the min and max values for each Train's Date with a groupby, dedupe the DataFrame to get the other columns (as they are repeated) and then print the results with some datetime formatting

df.loc[:, 'Date'] = pd.to_datetime(df['Date'])

g = df.groupby(['Train Number']).agg(date_min=pd.NamedAgg(column='Date', aggfunc='min'), date_max=pd.NamedAgg(column='Date', aggfunc='max'))
g = g.join(df_deduped, how='inner')

df_deduped = df.loc[:, 'Train Number':].drop_duplicates().set_index('Train Number')

for index, values in g.reset_index().iterrows():
    print(f'For Train Number {values["Train Number"]}, {values["Station 1"]} to {values["Station 2"]}, we have equipment available from {values["date_min"].strftime("%b %d")} to {values["date_max"].strftime("%b %d")}')

The output is -

For Train Number 1111, North to Central, we have equipment available from Jun 16 to Jun 20
For Train Number 2222, North to South, we have equipment available from Jun 01 to Jun 08

CodePudding user response：

I've made a little function which you can call on whatever df you want.

I find this solution more readable and flexible for further requests.

def equip_avail(df):
  for i in df['Train Number'].unique():
    date_start = df.Date.loc[(df['Train Number']==i)].min()
    date_end = df.Date.loc[(df['Train Number']==i)].max()
    from_start = df.Station1.loc[(df['Train Number']==i)].values[0]
    to_end = df.Station2.loc[(df['Train Number']==i)].values[0]
    print(f'For Train Number {i} {from_start} to {to_end}, we have equipment available from {date_start} to {date_end}.')

Then you call it like this:

equip_avail(df)

Result:

For Train Number 1111 North to Central, we have equipment available from 2022-06-16 to 2022-06-20.
For Train Number 2222 North to South, we have equipment available from 2022-06-01 to 2022-06-08.