Home > Net >  How to summarize a column based on repeated values using a for loop?
How to summarize a column based on repeated values using a for loop?

Time:07-21

I'm trying to sum the duration of the activities using a for loop with conditions. My knowledge of for loops is lacking, and I'm unsure how to search the right keywords to get my answer.

A slice of my data frame looks like this : 
mydf= {'Duration' : [14, 8, 6, 36, 12, 5, 3, 2, 4, 5, 8, 3, 14, 1, 27, 25, 117, 2, 962, 2, 2, 1], 'Activity': ['Groom', 'Pause', 'Groom', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Awaken', 'Groom', 'Pause', 'Groom', 'Eat', 'Cuddle', 'Come down', 'Dig', 'Forage']}
df = pd.DataFrame(mydf)

I want to sum all the sleep duration together. Instead of Sleep appearing multiple times, just one "Sleep" with the summed duration. However, the multiplied "Sleep" patterns occur several times in my dataframe, so I thought I should do a for loop, but I am unsure of my logic.

I tried:

Sleep_sum = [sum(df['Duration'] for i in df['Activity'] if [i 1]=='Sleep')] 

but this gives me an error : TypeError: can only concatenate str (not "int") to str.

I also tried this, but this looks very wrong.

(if df['Activity'] == 'Sleep'   df['Activity'] == 'Twitch').df['Duration'].sum()

Essentially, I need to sum the duration of 'Sleep" if it's followed by another row == "Sleep".

Thank you for your time!

CodePudding user response:

You can create virtual groups to group consecutive activities (Sleep and other):

out = (df.groupby(df['Activity'].ne(df['Activity'].shift()).cumsum(), as_index=False)
         .agg({'Duration': 'sum', 'Activity': 'first'}))
print(out)

# Output
    Duration   Activity
0         14      Groom
1          8      Pause
2          6      Groom
3         92      Sleep
4          1     Awaken
5         27      Groom
6         25      Pause
7        117      Groom
8          2        Eat
9        962     Cuddle
10         2  Come down
11         2        Dig
12         1     Forage

Update

Just for Sleep activity:

m = df['Activity'] == 'Sleep'
sleep = (df.reset_index(drop=False)[m]
           .groupby(df['Activity'].ne(df['Activity'].shift()).cumsum())
           .agg({'index': 'first', 'Duration': 'sum', 'Activity': 'first'})
           .set_index('index'))

out = pd.concat([df[~m], sleep]).sort_index()
print(out)

# Output
    Duration   Activity
0         14      Groom
1          8      Pause
2          6      Groom
3         92      Sleep
13         1     Awaken
14        27      Groom
15        25      Pause
16       117      Groom
17         2        Eat
18       962     Cuddle
19         2  Come down
20         2        Dig
21         1     Forage
  • Related