Home > database >  Filter data based on multiple column with condition
Filter data based on multiple column with condition

Time:05-23

I'm trying to filter data based on two columns and delete the rows with values less than 1 and that has NaN.

df = pd.DataFrame('col_1':[0,4,6,0,0,7,9,3,1],'col_2':[0,0,0,2,1,14,3,4,5])

This is what I have tried but not getting the result I want

df.sort_values(by = ['col_1', 'col_2'], ascending = False)
cond = df[(df['col_1']>1.0) & (df['col_2']>1.0)].index
df.drop(cond, inplace= True)

Not sure if I should split the columns first and filter out then join the table together.

I want the output to look like this:

df = pd.DataFrame('col_1':[4,6,7,9,3,1],'col_2':[2,1,14,3,4,5])

CodePudding user response:

Use concat with filtered columns and default index by Series.reset_index:

df = pd.concat([df.loc[df['col_1']>=1.0, 'col_1'].dropna().reset_index(drop=True), 
                df.loc[df['col_2']>=1.0, 'col_2'].dropna().reset_index(drop=True)], axis=1)
print (df)
   col_1  col_2
0      4      2
1      6      1
2      7     14
3      9      3
4      3      4
5      1      5

CodePudding user response:

You will need to split the two columns first then drop the Nan & values less than 0 and then again merge the columns. You can do it using the below code.

df = pd.DataFrame({'col_1':[0,4,6,0,0,7,9,3,1,np.nan],'col_2':[0,0,0,2,1,14,3,4,5, np.nan]})
df1 = df[['col_1']]
df2 = df[['col_2']]
df11 = df1.loc[(df1.col_1 > 1) & (df1.col_1.isnull() == False)].reset_index(drop = True)
df21 = df2.loc[(df2.col_2 > 1) & (df2.col_2.isnull() == False)].reset_index(drop = True)
df_fin = pd.concat([df11, df21], axis = 1)
df_fin.head()

This is how the output will look

col_1 col_2
4.0 2.0
6.0 14.0
7.0 3.0
9.0 4.0
3.0 5.0
  • Related