I have a dataframe like this:
Jobs Total_Duration_(seconds) #_of_items Job_divisible
Signature 17000 100 1
Verification 4000 20 0
Train 6000 50 1
Test 3500 12 1
If the work is divisible, I would like to divide the more than 1 hour work into 1 hour sub-jobs. The desired output is:
Jobs Total_Duration_(seconds) #_of_items
Signature_1 3570 21
Signature_2 3570 21
Signature_3 3570 21
Signature_4 3570 21
Signature_5 2720 16
Verification 4000 20
Train_1 3600 30
Train_2 2400 20
Test 3500 12
Could you please help me about this?
CodePudding user response:
Assuming equal division of the tasks, you can use:
n = (np.ceil(df['Total_Duration_(seconds)'].div(3600)).astype(int)
.where(df['Job_divisible'].eq(1), 1)
)
out = (df
.assign(**{'Total_Duration_(seconds)': df['Total_Duration_(seconds)'].div(n),
'#_of_items': df['#_of_items'].div(n),
})
.loc[df.index.repeat(n)]
.assign(Jobs=lambda d: d['Jobs'].mask((g:=d.groupby('Jobs')).transform('size').gt(1),
df['Jobs'].add('_' g.cumcount().add(1).astype(str))
)
)
)
Output:
Jobs Total_Duration_(seconds) #_of_items Job_divisible
0 Signature_1 3400.0 20.0 1
0 Signature_2 3400.0 20.0 1
0 Signature_3 3400.0 20.0 1
0 Signature_4 3400.0 20.0 1
0 Signature_5 3400.0 20.0 1
1 Verification 4000.0 20.0 0
2 Train_1 3000.0 25.0 1
2 Train_2 3000.0 25.0 1
3 Test 3500.0 12.0 1
CodePudding user response:
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({
'Jobs': ['Signature', 'Verification', 'Train', 'Test'],
'Total_Duration_(seconds)': [17000, 4000, 6000, 3500],
'#_of_items': [100, 20, 50, 12],
'Job_divisible': [1, 0, 1, 1]
})
# Define a custom function that divides a job into multiple sub-jobs
# based on the total duration and the job divisible flag
def divide_jobs(group):
# Check if the job is divisible
if group['Job_divisible'].iloc[0] == 1:
# Calculate the total duration of the job
total_duration = group['Total_Duration_(seconds)'].iloc[0]
# Divide the job into multiple sub-jobs of duration 3600 seconds (1 hour)
sub_jobs = []
for i in range(total_duration // 3600):
sub_jobs.append({
'Jobs': group['Jobs'].iloc[0] f'_{i 1}',
'Total_Duration_(seconds)': 3600,
'#_of_items': group['#_of_items'].iloc[0]
})
# If the remaining duration is not zero, create a final