My dataset is like this:
ExecutionTime | Code | Amount |
---|---|---|
09102021D081020 | HUUSNJUNJ | 500000 |
09102021D081020 | HUNSKMWKKS | 500000 |
09102021D093042 | NAUWJMKMS | 498333 |
09102021D093042 | SJIAJIJIJI | 498333 |
09102021D093042 | BHDYWHWHY | 498333 |
09102021D081020 | JHWJDNJWND | 454555 |
Then I expect to detect that when I have 2 lines ExecutionTime is the same, it is Type 1, 3 lines ExecutionTime same, it is Type 2 and I get that line to new dataframe like this:
ExecutionTime | Code | Type of Operation |
---|---|---|
09102021D081020 | HUUSNJUNJ | Type 1 |
09102021D093042 | NAUWJMKMS | Type 2 |
Welcome to any idea and suggestions! Thanks!
CodePudding user response:
IIUC, You can use groupby on "ExecutionTime","Amount"
and transform the count, then with a helper dictionary, map the values and then aggregate first with groupby on ExecutionTime:
s = df.groupby(["ExecutionTime","Amount"])['ExecutionTime'].transform('count')
d = {2:'Type 1',3:'Type 2'} #expand your dict as necessary
out = (df.assign(Type=s.map(d)).groupby('ExecutionTime',as_index=False)
[['Code','Type']].first())
print(out)
ExecutionTime Code Type
0 09102021D081020 HUUSNJUNJ Type 1
1 09102021D093042 NAUWJMKMS Type 2
CodePudding user response:
Assuming you want to group by ExecutionTime and Amount and only keep the groups with 2 or 3 rows. Use groupby
with a custom function and a defaultdict
, then dropna
from collections import defaultdict
maps = defaultdict(lambda:None, {2: 'Type 1', 3: 'Type 2'})
(df.groupby(['ExecutionTime', 'Amount'])['Code']
.agg([('Code', 'first'), ('Type of operation', lambda s: maps[s.count()])])
.dropna(subset=['Type of operation'])
.droplevel(1)
)
output:
Code Type of operation
ExecutionTime
09102021D081020 HUUSNJUNJ Type 1
09102021D093042 NAUWJMKMS Type 2
CodePudding user response:
Try this:
new_df = (
df.assign(
**{'Type of Operation': (
df.groupby(
df.ExecutionTime.ne(
df.ExecutionTime.shift(1)
).cumsum()
)['Code']
.transform(lambda x: f'Type {x.shape[0] - 1}')
)}
)
.drop_duplicates(['Type of Operation'])
.reset_index(drop=True)
)
Output:
>>> new_df
ExecutionTime Code Amount Type of Operation
0 09102021D081020 HUUSNJUNJ 500000 Type 1
1 09102021D093042 NAUWJMKMS 498333 Type 2
2 09102021D081020 JHWJDNJWND 454555 Type 0
Compact version:
df.assign(**{'Type of Operation':df.groupby(df.ExecutionTime.ne(df.ExecutionTime.shift(1)).cumsum())['Code'].transform(lambda x: f'Type {x.shape[0] - 1}')}).drop_duplicates(['Type of Operation']).reset_index(drop=True)