I have this data of 35k entries, I want to track only those Event_Action entries which have 0 in Event_Label and export them. Condition is : if an entry 100-genuine-data-entry-jobs-without-investment:us has 0 return all results and 1099-sales-jobs:us does't contain 0 , so we dont need this.
What have I tried so far is first
- filter all Event_Action entries which have 0 against them and create a new sheet/table
df1 = pd.read_excel(pwd "/zero_instance.xlsx")
df2 = pd.read_excel(pwd "/All Last 7 days data rcom.xlsx")
- then
df3 = df1.merge(df2, how='left', left_on='Event_Action', right_on='Event_Action')
But its returning me only first instances of each row. I need all instances when condition is met.
Here's the sample of the data I'm working on.
Event_Action | Event_Label |
---|---|
100-genuine-data-entry-jobs-without-investment:us | 0 |
100-genuine-data-entry-jobs-without-investment:us | 20000 |
100-genuine-data-entry-jobs-without-investment:us | 42323 |
100-genuine-data-entry-jobs-without-investment:us | 72639 |
100-genuine-data-entry-jobs-without-investment:us | 73898 |
100-genuine-online-data-entry-jobs:us | 122299 |
1099-bookkeeping-jobs:us | 0 |
1099-bookkeeping-jobs:us | 568 |
1099-bookkeeping-jobs:us | 2004 |
1099-bookkeeping-jobs:us | 3001 |
1099-sales-jobs:us | 1721341 |
1099-sales-jobs:us | 1779187 |
10th-pass:in | 3870 |
1120:in | 11 |
1120:in | 25 |
12-pass:in | 474 |
12th-pass:in | 7 |
12th-pass:in | 230 |
12th-pass:in | 4469 |
12th-pass-best:in | 143 |
12th-pass-fresher-jobs:us | 703 |
12th-pass-patanjali:in | 0 |
12th-pass-patanjali:in | 97 |
12th-pass-patanjali:in | 102 |
CodePudding user response:
Here is the solution:
import pandas as pd
import io
data_string = """ Event_Action Event_Label
100-genuine-data-entry-jobs-without-investment:us 0
100-genuine-data-entry-jobs-without-investment:us 20000
100-genuine-data-entry-jobs-without-investment:us 42323
100-genuine-data-entry-jobs-without-investment:us 72639
100-genuine-data-entry-jobs-without-investment:us 73898
100-genuine-online-data-entry-jobs:us 122299
1099-bookkeeping-jobs:us 0
1099-bookkeeping-jobs:us 568
1099-bookkeeping-jobs:us 2004
1099-bookkeeping-jobs:us 3001
1099-sales-jobs:us 1721341
1099-sales-jobs:us 1779187
10th-pass:in 3870
1120:in 11
1120:in 25
12-pass:in 474
12th-pass:in 7
12th-pass:in 230
12th-pass:in 4469
12th-pass-best:in 143
12th-pass-fresher-jobs:us 703
12th-pass-patanjali:in 0
12th-pass-patanjali:in 97
12th-pass-patanjali:in 102
"""
df = pd.read_csv(io.StringIO(data_string), sep='\s ')
# Solution
list_target = df[df['Event_Label'] == 0]['Event_Action'].to_list()
df_result = df[df['Event_Action'].isin(list_target)]
print(df_result)
Event_Action Event_Label
0 100-genuine-data-entry-jobs-without-investment:us 0
1 100-genuine-data-entry-jobs-without-investment:us 20000
2 100-genuine-data-entry-jobs-without-investment:us 42323
3 100-genuine-data-entry-jobs-without-investment:us 72639
4 100-genuine-data-entry-jobs-without-investment:us 73898
6 1099-bookkeeping-jobs:us 0
7 1099-bookkeeping-jobs:us 568
8 1099-bookkeeping-jobs:us 2004
9 1099-bookkeeping-jobs:us 3001
21 12th-pass-patanjali:in 0
22 12th-pass-patanjali:in 97
23 12th-pass-patanjali:in 102
CodePudding user response:
Using groupby and transform
# Choose rows whose min Event_Label for the group is zero.
df_result = df[df.groupby('Event_Action')['Event_Label'].transform('min')==0]
print(df_result)
Event_Action Event_Label
0 100-genuine-data-entry-jobs-without-investment:us 0
1 100-genuine-data-entry-jobs-without-investment:us 20000
2 100-genuine-data-entry-jobs-without-investment:us 42323
3 100-genuine-data-entry-jobs-without-investment:us 72639
4 100-genuine-data-entry-jobs-without-investment:us 73898
6 1099-bookkeeping-jobs:us 0
7 1099-bookkeeping-jobs:us 568
8 1099-bookkeeping-jobs:us 2004
9 1099-bookkeeping-jobs:us 3001
21 12th-pass-patanjali:in 0
22 12th-pass-patanjali:in 97
23 12th-pass-patanjali:in 102