I have a data frame df like,
id name age duration
1 ABC 20 12
2 sd 50 150
3 df 54 40
i want to duplicate this data in same df until the duration sum is more than or equal to 300,
so the df can be like..
id name age duration
1 ABC 20 12
2 sd 50 150
3 df 54 40
2 sd 50 150
so far i have tried the below code, but this is running in infinite loop sometimes :/ . please help.
def fillPlaylist(df,duration):
print("inside fill playlist fn.")
if(len(df)==0):
print("df len is 0, cannot fill.")
return df;
receivedDf= df
print("receivedDf",receivedDf,flush=True)
print("Received df len = ",len(receivedDf),flush=True)
print("duration to fill ",duration,flush=True)
while df['duration'].sum() < duration:
# random 5% sample of data.
print("filling")
ramdomSampleDuplicates = receivedDf.sample(frac=0.05).reset_index(drop=True)
df = pd.concat([ramdomSampleDuplicates,df])
print("df['duration'].sum() ",df['duration'].sum())
print("after filling df len = ",len(df))
return df;
CodePudding user response:
Try using n
instead of frac
.
n
randomly sample n rows from your dataframe.
sample_df = df.sample(n=1).reset_index(drop=True)
To use frac
you can rewrite your code in this way.
def fillPlaylist(df,duration):
while df.duration.sum() < duration:
sample_df = df.sample(frac=0.5).reset_index(drop=True)
df = pd.concat([df,sample_df])
return df