Here is my simplified example dataframe:
timestamp A B
1422404668 1 1
1422404670 2 2
1422404672 -3 3
1422404674 -4 4
1422404676 5 5
1422404678 -6 6
1422404680 -7 7
1422404680 8 8
Is there a way to groupby/filter only positive and negative values and get first value of each group in column A and mean values of column B as below output
Expected output:
timestamp A B
1422404668 1 3
1422404672 -3 7
1422404676 5 5
1422404678 -6 13
1422404680 8 8
Data:
{'timestamp': [1422404668, 1422404670, 1422404672, 1422404674,
1422404676, 1422404678, 1422404680, 1422404680],
'A': [1, 2, -3, -4, 5, -6, -7, 8], 'B': [1, 2, 3, 4, 5, 6, 7, 8]}
CodePudding user response:
IIUC, you could drop consecutively duplicate signed "A"s (so like, the row with 2 in column "A" is dropped because it has the same sign as 1, the immediate previous value in column "A"):
out = df[df['A'].ge(0).astype(int).diff()!=0]
it turns out, you don't need to convert to int (thanks @Corralien):
out = df[df['A'].ge(0).diff()!=0]
Output:
timestamp A
0 1422404668 1
2 1422404672 -3
4 1422404676 5
5 1422404678 -6
7 1422404680 8
CodePudding user response:
something like this?
I made two frames with negative values from A column and positive values from A column.
Then find first occurence for negative and positive and concat frame to out.
df_positive = df[df['A'] > 0]
df_negative = df[df['A'] < 0]
df_positive = df_positive.groupby('A').first().reset_index()
df_negative = df_negative.groupby('A').first().reset_index()
out = pd.concat([df_positive,df_negative ])[['timestamp', 'A']]