Home > Enterprise >  Dataframe groupby or filter
Dataframe groupby or filter

Time:02-23

Here is my simplified example dataframe:

  timestamp   A  B

1422404668   1   1
1422404670   2   2  
1422404672  -3   3  
1422404674  -4   4  
1422404676   5   5  
1422404678  -6   6
1422404680  -7   7
1422404680   8   8 

Is there a way to groupby/filter only positive and negative values and get first value of each group in column A and mean values of column B as below output

Expected output:

timestamp    A  B
1422404668   1  3  
1422404672  -3  7     
1422404676   5  5   
1422404678  -6  13
1422404680   8  8

Data:

{'timestamp': [1422404668, 1422404670, 1422404672, 1422404674,
  1422404676, 1422404678, 1422404680, 1422404680],
 'A': [1, 2, -3, -4, 5, -6, -7, 8], 'B': [1, 2, 3, 4, 5, 6, 7, 8]}

CodePudding user response:

IIUC, you could drop consecutively duplicate signed "A"s (so like, the row with 2 in column "A" is dropped because it has the same sign as 1, the immediate previous value in column "A"):

out = df[df['A'].ge(0).astype(int).diff()!=0]

it turns out, you don't need to convert to int (thanks @Corralien):

out = df[df['A'].ge(0).diff()!=0]

Output:

    timestamp  A
0  1422404668  1
2  1422404672 -3
4  1422404676  5
5  1422404678 -6
7  1422404680  8

CodePudding user response:

something like this?

I made two frames with negative values from A column and positive values from A column.

Then find first occurence for negative and positive and concat frame to out.

df_positive = df[df['A'] > 0]
df_negative = df[df['A'] < 0]

df_positive = df_positive.groupby('A').first().reset_index()
df_negative = df_negative.groupby('A').first().reset_index()

out = pd.concat([df_positive,df_negative ])[['timestamp', 'A']]
  • Related