Home > Enterprise >  Manipulate pandas dataframe based on values in column
Manipulate pandas dataframe based on values in column

Time:09-23

I have a pandas dataframe as follows, with four columns. How would one trim the dataset based on the values in fourth column. The fourth column header is "isValid"

Input:

      X     Y    I  isValid
    -60.3 -15.63 25 1
    -60.2 -15.63 10 1
    -60.1 -15.63 0 0
    -60.0 -28.23 0 0
    -59.8 -28.23 25 1
    -59.7 -28.23 15 1
    -59.7 -28.23 0 1

Output - 1 :

X    Y     I
-60.3 -15.63 25 
-60.2 -15.63 10 
-59.8 -28.23 25 
-59.7 -28.23 15 
-59.7 -28.23 0 

Edit: I was able to achieve Output 1, by using something as follows:

df = df.loc[df['isValid'] == 1]

Output 2:

For a given value in second column, average the third column values.

   Y      I
 -15.63 (25 10)/2 
 -28.23 (25 15)/2

I am presently converting everything into numpy arrays and working with loops. Hoping there is a much simpler way.

CodePudding user response:

Try:

df = pd.read_clipboard()

dfm = df[df['isValid'] == 1]

df_out = dfm.groupby('Y', as_index=False)['I'].mean()

Output:

       Y     I
0 -28.23  20.0
1 -15.63  17.5
  • Related