Home > database >  How do I remove rows based on multiple conditions in Python / Pandas dataframe?
How do I remove rows based on multiple conditions in Python / Pandas dataframe?

Time:12-15

I have a table which looks something like this:

Identified Software Version Date
0 Microsoft Office 2 2022-05-25
0 Microsoft Office 1 2022-03-21
0 Adobe Photoshop 2 2022-04-20
1 Adobe Photoshop 1 2021-04-04

The 'Identified' column is a column I have created using this code:

import pandas as pd
import datetime as dt

dfcheck = pd.read_csv('version-data.csv', encoding='utf8')
df = pd.DataFrame(dfcheck)

olderdata = dt.date.today() - pd.DateOffset(years=1)

df['Identified'] = (df['Date'] <= olderdata).astype(int)

In this I have marked everything older than one year. So now what I'm trying to do is create a new dataframe which shows all software packages which have been identified. Here is the output I am looking for:

Identified Software Version Date
0 Adobe Photoshop 2 2022-04-20
1 Adobe Photoshop 1 2021-04-04

How do I achieve this?

CodePudding user response:

You can use groupby.filter:

out = df.groupby('Software').filter(lambda x: (x.Identified==1).any())

print (out)

   Identified          Software   Version        Date
2           0   Adobe Photoshop         2  2022-04-20
3           1   Adobe Photoshop         1  2021-04-04
  • Related