I have a dataset with three columns: Username, Computer, Success/Failure.
I want to count how many similar Username Computer pairs are, where the third column is Success. I want the result to be a dataset with 1 column, and if the third column from the original dataset is Failure, the same column in the result will contain 0. The result table must contain the same amount of rows as the original table.
For example:
The original dataset:
User | Computer | Success or Failure |
---|---|---|
admin | DC | Success |
admin | DC | Success |
admin | DC | Fail |
admin | Server | Success |
admin | Server | Fail |
User | Computer | Success |
User | Computer | Success |
User | Computer | Fail |
The result:
No | Count |
---|---|
1 | 2 |
2 | 2 |
3 | 0 |
4 | 1 |
5 | 0 |
6 | 2 |
7 | 2 |
8 | 0 |
All of the rows that is failed are not counted and they are filled with 0 in the result.
I am programming in python using pandas. I could do this using the tolist() method and then creating a new list, with a simple for loop and a condition, but I am looking for a way to do this with pandas builtin methods to save memory and time.
Thank you very much!
Here's the code with the tolist method:
result = [user_and_computer = [list(x) for x in list(zip(df["user"].tolist(),df["computer"].tolist(),df["is_success"].tolist()))]
for logon in user_and computer:
if is_success:
result.append(user_and_computer.count(logon))
else:
result.append(0)
CodePudding user response:
Try:
df["Count"] = df.groupby(["User", "Computer"])["Success or Failure"].transform(
lambda x: x.eq("Success").sum()
)
df.loc[df["Success or Failure"] != "Success", "Count"] = 0
print(df)
Prints:
User Computer Success or Failure Count
0 admin DC Success 2
1 admin DC Success 2
2 admin DC Fail 0
3 admin Server Success 1
4 admin Server Fail 0
5 User Computer Success 2
6 User Computer Success 2
7 User Computer Fail 0
CodePudding user response:
First create a boolean mask to find the rows where the value is Success
, then group this mask by itself along with User
and Computer
columns and transform with sum to get the result
m = df['Success or Failure'].eq('Success')
df['count'] = m.groupby([df['User'], df['Computer'], m]).transform('sum')
User Computer Success or Failure count
0 admin DC Success 2
1 admin DC Success 2
2 admin DC Fail 0
3 admin Server Success 1
4 admin Server Fail 0
5 User Computer Success 2
6 User Computer Success 2
7 User Computer Fail 0