Home > Enterprise >  Fill dataframe with duplicate data until a certain conditin is met
Fill dataframe with duplicate data until a certain conditin is met

Time:09-23

I have a data frame df like,

id name age duration
1  ABC  20   12
2  sd   50   150
3  df   54   40

i want to duplicate this data in same df until the duration sum is more than or equal to 300,

so the df can be like..

id name age duration
1  ABC  20   12
2  sd   50   150
3  df   54   40
2  sd   50   150

so far i have tried the below code, but this is running in infinite loop sometimes :/ . please help.

def fillPlaylist(df,duration):
    print("inside fill playlist fn.")
    if(len(df)==0):
        print("df len is 0, cannot fill.")
        return df;

    receivedDf= df
    print("receivedDf",receivedDf,flush=True)
    print("Received df len = ",len(receivedDf),flush=True)
    print("duration to fill ",duration,flush=True)
    while df['duration'].sum() < duration:
        # random 5% sample of data.
        print("filling")
        ramdomSampleDuplicates = receivedDf.sample(frac=0.05).reset_index(drop=True)
        df = pd.concat([ramdomSampleDuplicates,df])
        print("df['duration'].sum() ",df['duration'].sum())
    print("after filling df len = ",len(df))
    return df;

CodePudding user response:

Try using n instead of frac.

n randomly sample n rows from your dataframe.

sample_df = df.sample(n=1).reset_index(drop=True)

To use frac you can rewrite your code in this way.

def fillPlaylist(df,duration):
    while df.duration.sum() < duration:
        sample_df = df.sample(frac=0.5).reset_index(drop=True)
        df = pd.concat([df,sample_df])
    return df
  • Related