Home > Enterprise >  Pandas Expanding Rows According to the Condition
Pandas Expanding Rows According to the Condition

Time:12-06

I have a dataframe like this:

Jobs           Total_Duration_(seconds)   #_of_items    Job_divisible
Signature      17000                      100           1
Verification   4000                       20            0
Train          6000                       50            1
Test           3500                       12            1

If the work is divisible, I would like to divide the more than 1 hour work into 1 hour sub-jobs. The desired output is:

Jobs           Total_Duration_(seconds) #_of_items
Signature_1    3570                     21
Signature_2    3570                     21
Signature_3    3570                     21
Signature_4    3570                     21
Signature_5    2720                     16
Verification   4000                     20
Train_1        3600                     30
Train_2        2400                     20
Test           3500                     12

Could you please help me about this?

CodePudding user response:

Assuming equal division of the tasks, you can use:

n = (np.ceil(df['Total_Duration_(seconds)'].div(3600)).astype(int)
       .where(df['Job_divisible'].eq(1), 1)
    )

out = (df
   .assign(**{'Total_Duration_(seconds)': df['Total_Duration_(seconds)'].div(n),
              '#_of_items': df['#_of_items'].div(n),
              })
   .loc[df.index.repeat(n)]
   .assign(Jobs=lambda d: d['Jobs'].mask((g:=d.groupby('Jobs')).transform('size').gt(1),
                                          df['Jobs'].add('_' g.cumcount().add(1).astype(str))
                                          )
          )
)

Output:

           Jobs  Total_Duration_(seconds)  #_of_items  Job_divisible
0   Signature_1                    3400.0        20.0              1
0   Signature_2                    3400.0        20.0              1
0   Signature_3                    3400.0        20.0              1
0   Signature_4                    3400.0        20.0              1
0   Signature_5                    3400.0        20.0              1
1  Verification                    4000.0        20.0              0
2       Train_1                    3000.0        25.0              1
2       Train_2                    3000.0        25.0              1
3          Test                    3500.0        12.0              1

CodePudding user response:

import pandas as pd

# Create a sample dataframe
df = pd.DataFrame({
    'Jobs': ['Signature', 'Verification', 'Train', 'Test'],
    'Total_Duration_(seconds)': [17000, 4000, 6000, 3500],
    '#_of_items': [100, 20, 50, 12],
    'Job_divisible': [1, 0, 1, 1]
})

# Define a custom function that divides a job into multiple sub-jobs
# based on the total duration and the job divisible flag
def divide_jobs(group):
    # Check if the job is divisible
    if group['Job_divisible'].iloc[0] == 1:
        # Calculate the total duration of the job
        total_duration = group['Total_Duration_(seconds)'].iloc[0]

        # Divide the job into multiple sub-jobs of duration 3600 seconds (1 hour)
        sub_jobs = []
        for i in range(total_duration // 3600):
            sub_jobs.append({
                'Jobs': group['Jobs'].iloc[0]   f'_{i   1}',
                'Total_Duration_(seconds)': 3600,
                '#_of_items': group['#_of_items'].iloc[0]
            })

        # If the remaining duration is not zero, create a final
  • Related