Dt | 1/2/21 | 2/2/21 | 3/2/21 | 4/2/21 | 5/2/21 | 6/2/21 | 7/2/21 | Attendance(Expected output in python) |
---|---|---|---|---|---|---|---|---|
san | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | 1/2/21 - 7/2/21 |
don | TRUE | TRUE | FALSE | TRUE | TRUE | TRUE | TRUE | 1/2/21 -2/2/21,4/2/21-7/2/21 |
sam | FALSE | TRUE | TRUE | FALSE | TRUE | TRUE | TRUE | 2/2/21 - 3/2/21,5/2/21-7/2/21 |
den | FALSE | FALSE | TRUE | FALSE | TRUE | TRUE | FALSE | 3/2/21,5/2/21 - 6/2/21 |
I want to add Attendance column using pandas
CodePudding user response:
Welcome and thank you for this nice "exercice"! ;)
This is the best I can do.
I iterate through each rows with list comprehension using .iloc[]
, and check which .columns
is/are True
, then a ''.join()
the result, make them a list with .tolist()
to be able to assign to the new column.
df['Attendance'] = [','.join(df.columns[(df.iloc[i] == True)].tolist()) for i in range(len(df.Dt))]
df
index | Dt | 1/2/21 | 2/2/21 | 3/2/21 | 4/2/21 | 5/2/21 | 6/2/21 | 7/2/21 | Attendance |
---|---|---|---|---|---|---|---|---|---|
0 | san | true | true | true | true | true | true | true | 1/2/21,2/2/21,3/2/21,4/2/21,5/2/21,6/2/21,7/2/21 |
1 | don | true | true | false | true | true | true | true | 1/2/21,2/2/21,4/2/21,5/2/21,6/2/21,7/2/21 |
2 | sam | false | true | true | false | true | true | true | 2/2/21,3/2/21,5/2/21,6/2/21,7/2/21 |
3 | den | false | false | true | false | true | true | false | 3/2/21,5/2/21,6/2/21 |
CodePudding user response:
I'm starting from a previous answer to get a list of dates in the column "Attendance".
df['Attendance'] = [df.columns[(df.iloc[i] == True)].tolist()) for i in range(len(df.Dt)]
Now, I use pandas deltas (differences in time) to find consecutive days, extract only the first and last ones of each range of consecutives, and print it with the right format:
def get_consecutives(dates):
consecutives = []
while len(dates) > 1:
if dates[1] - dates[0] == delta_1day:
consecutives.append(dates.pop(0))
else:
break
consecutives.append(dates.pop(0))
return consecutives
def all_days(dates):
total = []
while dates:
total.append(get_consecutives(dates))
formatted_total = ', '.join(['-'.join((range[0].strftime('%d/%m/%y'), range[-1].strftime('%d/%m/%y'))) if len(range) > 1
else range[0].strftime('%d/%m/%y') for range in total])
return formatted_total
Finally, I'll make a function to map all the previous to the "Attendance" column:
def mapping(list_of_dates):
dates = pd.to_datetime(list_of_dates, dayfirst=True,).to_list()
return all_days(dates)
df['Attendance'] = df['Attendance'].map(mapping)