How do I remove rows based on multiple conditions in Python / Pandas dataframe?-CodePudding

I have a table which looks something like this:

Identified	Software	Version	Date
0	Microsoft Office	2	2022-05-25
0	Microsoft Office	1	2022-03-21
0	Adobe Photoshop	2	2022-04-20
1	Adobe Photoshop	1	2021-04-04

The 'Identified' column is a column I have created using this code:

import pandas as pd
import datetime as dt

dfcheck = pd.read_csv('version-data.csv', encoding='utf8')
df = pd.DataFrame(dfcheck)

olderdata = dt.date.today() - pd.DateOffset(years=1)

df['Identified'] = (df['Date'] <= olderdata).astype(int)

In this I have marked everything older than one year. So now what I'm trying to do is create a new dataframe which shows all software packages which have been identified. Here is the output I am looking for:

Identified	Software	Version	Date
0	Adobe Photoshop	2	2022-04-20
1	Adobe Photoshop	1	2021-04-04

How do I achieve this?

CodePudding user response：

You can use groupby.filter:

out = df.groupby('Software').filter(lambda x: (x.Identified==1).any())

print (out)

   Identified          Software   Version        Date
2           0   Adobe Photoshop         2  2022-04-20
3           1   Adobe Photoshop         1  2021-04-04