Nama | No.ID | Tgl/Waktu | No.PIN | Kode Verifikasi |
---|---|---|---|---|
Alif | 100061 | 17/12/2022 07:53:26 | Sidik Jari | |
Alif | 100061 | 17/12/2022 13:00:25 | Sidik Jari | |
Alif | 100061 | 19/12/2022 07:54:59 | Sidik Jari | |
Alif | 100061 | 19/12/2022 16:18:14 | Sidik Jari | |
Alif | 100061 | 20/12/2022 07:55:54 | Sidik Jari | |
Alif | 100061 | 20/12/2022 16:16:16 | Sidik Jari | |
Alif | 100061 | 21/12/2022 07:54:46 | Sidik Jari | |
Alif | 100061 | 21/12/2022 16:15:41 | Sidik Jari | |
Alif | 100061 | 22/12/2022 07:55:54 | Sidik Jari | |
Alif | 100061 | 22/12/2022 16:15:59 | Sidik Jari | |
Alif | 100061 | 23/12/2022 07:56:26 | Sidik Jari | |
Alif | 100061 | 23/12/2022 16:16:56 | Sidik Jari | |
budi | 100063 | 17/12/2022 07:45:28 | Sidik Jari | |
budi | 100063 | 17/12/2022 13:00:23 | Sidik Jari | |
budi | 100063 | 19/12/2022 07:39:29 | Sidik Jari | |
budi | 100063 | 19/12/2022 16:17:37 | Sidik Jari | |
budi | 100063 | 20/12/2022 13:13:06 | Sidik Jari | |
budi | 100063 | 20/12/2022 16:16:14 | Sidik Jari | |
budi | 100063 | 21/12/2022 07:39:54 | Sidik Jari | |
budi | 100063 | 21/12/2022 16:15:38 | Sidik Jari | |
budi | 100063 | 22/12/2022 07:39:02 | Sidik Jari | |
budi | 100063 | 22/12/2022 16:15:55 | Sidik Jari | |
budi | 100063 | 23/12/2022 07:41:13 | Sidik Jari | |
budi | 100063 | 23/12/2022 16:16:25 | Sidik Jari |
so from that raw excel file i want to make a app to output the waging system
!pip install xlrd
import pandas as pd
from datetime import time, timedelta
import openpyxl
from google.colab import drive
drive.mount('/content/drive')
# Read the Excel file
path = '/content/drive/MyDrive/Colab Notebooks/Book1.xlsx'
df = pd.read_excel(path)
# Convert the 'Tgl/Waktu' column to datetime format
df['Tgl/Waktu'] = pd.to_datetime(df['Tgl/Waktu'])
# Extract the date and time from the 'Tgl/Waktu' column
df['Date'] = df['Tgl/Waktu'].dt.date
df['Time'] = df['Tgl/Waktu'].dt.time
# Group the data by employee name and date
grouped_df = df.groupby(['Nama', 'Date'])
# Set the overtime threshold to 16:30:00
overtime_threshold = time(hour=16, minute=30)
# Iterate over the grouped data
for (name, date), group in grouped_df:
# Calculate the total work hours and overtime hours for each employee on each day
start_time = group['Time'].min()
end_time = group['Time'].max()
total_hours = (timedelta(hours=end_time.hour, minutes=end_time.minute, seconds=end_time.second) -
timedelta(hours=start_time.hour, minutes=start_time.minute, seconds=start_time.second)).total_seconds() / 3600
if total_hours > 8:
hours_worked = 8
if end_time > overtime_threshold:
overtime_hours = (end_time - overtime_threshold).total_seconds() / 3600
else:
hours_worked = total_hours
overtime_hours = 0
if end_time > overtime_threshold:
overtime_hours = (end_time - overtime_threshold).total_seconds() / 3600
# Calculate the payment for each employee on each day
payment_each_date = 75000 * hours_worked 50000 * overtime_hours
# Add the total work hours, overtime hours, and payment as new columns to the dataframe
df.loc[(df['Nama'] == name) & (df['Date'] == date), 'Hours Worked'] = hours_worked
df.loc[(df['Nama'] == name) & (df['Date'] == date), 'Overtime Hours'] = overtime_hours
df.loc[(df['Nama'] == name) & (df['Date'] == date), 'Payment Each Date'] = payment_each_date
# Print the resulting dataframe
print(df)
# write DataFrame to excel
df.to_excel(excel_writer=r'/content/drive/MyDrive/Colab Notebooks/test.xlsx')
from that code i want to add another column that is total payment, i have an idea to add payment each date thats grouped by name and with different date
how do i do that?
i already try used groupby nama then sum the payment each day but its still error
CodePudding user response:
Updated Answer
If you want to remove duplicate date
rows prior to calculating the total payment for each Nama
value, then first group by Nama
and date
, take the max value (min is the same as the two values are equal), group that result by Nama
and sum the Payment Each Date
values. This produces a new dataframe with the total payment for each Nama
, df_total
, which can then be joined to the original dataframe to obtain the desired Total Payment
column:
df_total = df.groupby(['Nama', 'Date'])['Payment Each Date'].max().groupby('Nama').sum().rename('Total Payment')
df = df.merge(df_total, how='left', on='Nama')
Left join is safer as that will always keep all rows of df
just in case.
Updated Output df
:
Original Answer
If you just want to sum all the Payment Each Date
values in a new column, then add the following line of code after the for
loop code block and before the line that prints the df:
df['Total Payment'] = df.groupby('Nama')['Payment Each Date'].transform('sum')
The transform
takes care of aligning the indices of the series containing the sum for each Nama
value with the original df.
Please take a look at this SO post which has some examples similar to what you are asking about: How do I create a new column from the output of pandas groupby().sum()?