I want to add date range where value is True in pandas-CodePudding

Dt	1/2/21	2/2/21	3/2/21	4/2/21	5/2/21	6/2/21	7/2/21	Attendance(Expected output in python)
san	TRUE	TRUE	TRUE	TRUE	TRUE	TRUE	TRUE	1/2/21 - 7/2/21
don	TRUE	TRUE	FALSE	TRUE	TRUE	TRUE	TRUE	1/2/21 -2/2/21,4/2/21-7/2/21
sam	FALSE	TRUE	TRUE	FALSE	TRUE	TRUE	TRUE	2/2/21 - 3/2/21,5/2/21-7/2/21
den	FALSE	FALSE	TRUE	FALSE	TRUE	TRUE	FALSE	3/2/21,5/2/21 - 6/2/21

I want to add Attendance column using pandas

CodePudding user response：

Welcome and thank you for this nice "exercice"! ;)

This is the best I can do.

I iterate through each rows with list comprehension using .iloc[], and check which .columns is/are True, then a ''.join() the result, make them a list with .tolist() to be able to assign to the new column.

df['Attendance'] = [','.join(df.columns[(df.iloc[i] == True)].tolist()) for i in range(len(df.Dt))]
df

index	Dt	1/2/21	2/2/21	3/2/21	4/2/21	5/2/21	6/2/21	7/2/21	Attendance
0	san	true	true	true	true	true	true	true	1/2/21,2/2/21,3/2/21,4/2/21,5/2/21,6/2/21,7/2/21
1	don	true	true	false	true	true	true	true	1/2/21,2/2/21,4/2/21,5/2/21,6/2/21,7/2/21
2	sam	false	true	true	false	true	true	true	2/2/21,3/2/21,5/2/21,6/2/21,7/2/21
3	den	false	false	true	false	true	true	false	3/2/21,5/2/21,6/2/21

CodePudding user response：

I'm starting from a previous answer to get a list of dates in the column "Attendance".

df['Attendance'] = [df.columns[(df.iloc[i] == True)].tolist()) for i in range(len(df.Dt)]

Now, I use pandas deltas (differences in time) to find consecutive days, extract only the first and last ones of each range of consecutives, and print it with the right format:

def get_consecutives(dates):
    consecutives = []
    while len(dates) > 1:
        if dates[1] - dates[0] == delta_1day:
            consecutives.append(dates.pop(0))
        else:
            break
    consecutives.append(dates.pop(0))
    return consecutives

def all_days(dates):
  total = []
  while dates:
      total.append(get_consecutives(dates))

  formatted_total = ', '.join(['-'.join((range[0].strftime('%d/%m/%y'), range[-1].strftime('%d/%m/%y'))) if len(range) > 1 
                              else range[0].strftime('%d/%m/%y') for range in total])
  return formatted_total

Finally, I'll make a function to map all the previous to the "Attendance" column:

def mapping(list_of_dates):
    dates = pd.to_datetime(list_of_dates, dayfirst=True,).to_list()
    return all_days(dates)

df['Attendance'] = df['Attendance'].map(mapping)