Home > OS >  I want to add date range where value is True in pandas
I want to add date range where value is True in pandas

Time:06-08

Dt 1/2/21 2/2/21 3/2/21 4/2/21 5/2/21 6/2/21 7/2/21 Attendance(Expected output in python)
san TRUE TRUE TRUE TRUE TRUE TRUE TRUE 1/2/21 - 7/2/21
don TRUE TRUE FALSE TRUE TRUE TRUE TRUE 1/2/21 -2/2/21,4/2/21-7/2/21
sam FALSE TRUE TRUE FALSE TRUE TRUE TRUE 2/2/21 - 3/2/21,5/2/21-7/2/21
den FALSE FALSE TRUE FALSE TRUE TRUE FALSE 3/2/21,5/2/21 - 6/2/21

I want to add Attendance column using pandas

CodePudding user response:

Welcome and thank you for this nice "exercice"! ;)

This is the best I can do.

I iterate through each rows with list comprehension using .iloc[], and check which .columns is/are True, then a ''.join() the result, make them a list with .tolist() to be able to assign to the new column.

df['Attendance'] = [','.join(df.columns[(df.iloc[i] == True)].tolist()) for i in range(len(df.Dt))]
df
index Dt 1/2/21 2/2/21 3/2/21 4/2/21 5/2/21 6/2/21 7/2/21 Attendance
0 san true true true true true true true 1/2/21,2/2/21,3/2/21,4/2/21,5/2/21,6/2/21,7/2/21
1 don true true false true true true true 1/2/21,2/2/21,4/2/21,5/2/21,6/2/21,7/2/21
2 sam false true true false true true true 2/2/21,3/2/21,5/2/21,6/2/21,7/2/21
3 den false false true false true true false 3/2/21,5/2/21,6/2/21

CodePudding user response:

I'm starting from a previous answer to get a list of dates in the column "Attendance".

df['Attendance'] = [df.columns[(df.iloc[i] == True)].tolist()) for i in range(len(df.Dt)]

Now, I use pandas deltas (differences in time) to find consecutive days, extract only the first and last ones of each range of consecutives, and print it with the right format:

def get_consecutives(dates):
    consecutives = []
    while len(dates) > 1:
        if dates[1] - dates[0] == delta_1day:
            consecutives.append(dates.pop(0))
        else:
            break
    consecutives.append(dates.pop(0))
    return consecutives

def all_days(dates):
  total = []
  while dates:
      total.append(get_consecutives(dates))

  formatted_total = ', '.join(['-'.join((range[0].strftime('%d/%m/%y'), range[-1].strftime('%d/%m/%y'))) if len(range) > 1 
                              else range[0].strftime('%d/%m/%y') for range in total])
  return formatted_total

Finally, I'll make a function to map all the previous to the "Attendance" column:

def mapping(list_of_dates):
    dates = pd.to_datetime(list_of_dates, dayfirst=True,).to_list()
    return all_days(dates)

df['Attendance'] = df['Attendance'].map(mapping)
  • Related