I have a pandas dataframe as follows, with four columns. How would one trim the dataset based on the values in fourth column. The fourth column header is "isValid"
Input:
X Y I isValid
-60.3 -15.63 25 1
-60.2 -15.63 10 1
-60.1 -15.63 0 0
-60.0 -28.23 0 0
-59.8 -28.23 25 1
-59.7 -28.23 15 1
-59.7 -28.23 0 1
Output - 1 :
X Y I
-60.3 -15.63 25
-60.2 -15.63 10
-59.8 -28.23 25
-59.7 -28.23 15
-59.7 -28.23 0
Edit: I was able to achieve Output 1, by using something as follows:
df = df.loc[df['isValid'] == 1]
Output 2:
For a given value in second column, average the third column values.
Y I
-15.63 (25 10)/2
-28.23 (25 15)/2
I am presently converting everything into numpy arrays and working with loops. Hoping there is a much simpler way.
CodePudding user response:
Try:
df = pd.read_clipboard()
dfm = df[df['isValid'] == 1]
df_out = dfm.groupby('Y', as_index=False)['I'].mean()
Output:
Y I
0 -28.23 20.0
1 -15.63 17.5