Home > Software engineering >  pandas: create another data frame using existing data frame from the given index position
pandas: create another data frame using existing data frame from the given index position

Time:10-31

I have a following dataframe where I have one another list of index position based on some condition so just want to create the new dataframe based on the index position and check some condition on that.

df = pd.DataFrame()
df['index'] = [ 0, 28, 35, 49, 85, 105, 208, 386, 419, 512, 816, 888, 914, 989]
df['diff_in_min'] = [ 5, 35, 42, 46, 345, 85, 96, 107, 119, 325, 8, 56, 55, 216]
df['val_1'] = [5, 25, 2, 4, 2, 5, 69, 6, 8, 7, 55, 85, 8, 67]
df['val_2'] = [8, 89, 8, 5, 7, 57, 8, 57, 4, 8, 74, 65, 55, 74]
re_ind = list(np.where(df['diff_in_min'] >= 300))
re_ind = [np.array([85, 512], dtype='int64')]

Just I want to create another dataframe based on re_ind position, ex:

first_df = df[0:85] 
another_df = [85:512] 
last_df =  [512:] 

and each dataframe I want to check one condition

count = 0
temp_df = df[:re_ind[0]]
if temp_df['diff_in_min'].sum() > 500:
    count  = 1
temp_df = df[re_ind[0]:re_ind[1]]
if temp_df['diff_in_min'].sum() > 500:
    count  = 1
if temp_df = df[re_ind[1]:]
if temp_df['diff_in_min'].sum() > 500:
    count  = 1

How can I do that using for loop with creating new data frame using existing dataframe?

CodePudding user response:

From sample data for groups created by df['diff_in_min'] >= 300) add cumulative sum, then aggregate sum, compare for another condition and count Trues by sum:

s = (df['diff_in_min'] >= 300).cumsum()

out = (df['diff_in_min'].groupby(s).sum() > 500).sum()
print (out)
2

CodePudding user response:

jezrael's answer is much better and more succinct. However, in keeping with your style of programming, here is another way you could tackle it:

import pandas as pd

df = pd.DataFrame()
df['index'] = [ 0, 28, 35, 49, 85, 105, 208, 386, 419, 512, 816, 888, 914, 989]
df['diff_in_min'] = [ 5, 35, 42, 46, 345, 85, 96, 107, 119, 325, 8, 56, 55, 216]
df['val_1'] = [5, 25, 2, 4, 2, 5, 69, 6, 8, 7, 55, 85, 8, 67]
df['val_2'] = [8, 89, 8, 5, 7, 57, 8, 57, 4, 8, 74, 65, 55, 74]

df_list = []
df_list.append(df[df['index']<85])
df_list.append(df[(df['index']>=85) & (df['index'] <512)])
df_list.append(df[df['index']>=512])

count = 0
for temp_df in df_list:
    if temp_df['diff_in_min'].sum() > 500:
        count  = 1

print(f"Count = {count}")

OUTPUT:

Count = 2

Which is exactly what jezrael got, and why my vote goes to them.

  • Related