I have a table which looks something like this:
Identified | Software | Version | Date |
---|---|---|---|
0 | Microsoft Office | 2 | 2022-05-25 |
0 | Microsoft Office | 1 | 2022-03-21 |
0 | Adobe Photoshop | 2 | 2022-04-20 |
1 | Adobe Photoshop | 1 | 2021-04-04 |
The 'Identified' column is a column I have created using this code:
import pandas as pd
import datetime as dt
dfcheck = pd.read_csv('version-data.csv', encoding='utf8')
df = pd.DataFrame(dfcheck)
olderdata = dt.date.today() - pd.DateOffset(years=1)
df['Identified'] = (df['Date'] <= olderdata).astype(int)
In this I have marked everything older than one year. So now what I'm trying to do is create a new dataframe which shows all software packages which have been identified. Here is the output I am looking for:
Identified | Software | Version | Date |
---|---|---|---|
0 | Adobe Photoshop | 2 | 2022-04-20 |
1 | Adobe Photoshop | 1 | 2021-04-04 |
How do I achieve this?
CodePudding user response:
You can use groupby.filter
:
out = df.groupby('Software').filter(lambda x: (x.Identified==1).any())
print (out)
Identified Software Version Date
2 0 Adobe Photoshop 2 2022-04-20
3 1 Adobe Photoshop 1 2021-04-04