Pandas counting pairs of values from 2 columns with condition on the third-CodePudding

I have a dataset with three columns: Username, Computer, Success/Failure.

I want to count how many similar Username Computer pairs are, where the third column is Success. I want the result to be a dataset with 1 column, and if the third column from the original dataset is Failure, the same column in the result will contain 0. The result table must contain the same amount of rows as the original table.

For example:

The original dataset:

User	Computer	Success or Failure
admin	DC	Success
admin	DC	Success
admin	DC	Fail
admin	Server	Success
admin	Server	Fail
User	Computer	Success
User	Computer	Success
User	Computer	Fail

The result:

No	Count
1	2
2	2
3	0
4	1
5	0
6	2
7	2
8	0

All of the rows that is failed are not counted and they are filled with 0 in the result.

I am programming in python using pandas. I could do this using the tolist() method and then creating a new list, with a simple for loop and a condition, but I am looking for a way to do this with pandas builtin methods to save memory and time.

Thank you very much!

Here's the code with the tolist method:

result = [user_and_computer = [list(x) for x in list(zip(df["user"].tolist(),df["computer"].tolist(),df["is_success"].tolist()))]
for logon in user_and computer:
    if is_success:
         result.append(user_and_computer.count(logon))
    else:
         result.append(0)

CodePudding user response：

Try:

df["Count"] = df.groupby(["User", "Computer"])["Success or Failure"].transform(
    lambda x: x.eq("Success").sum()
)
df.loc[df["Success or Failure"] != "Success", "Count"] = 0
print(df)

Prints:

    User  Computer Success or Failure  Count
0  admin        DC            Success      2
1  admin        DC            Success      2
2  admin        DC               Fail      0
3  admin    Server            Success      1
4  admin    Server               Fail      0
5   User  Computer            Success      2
6   User  Computer            Success      2
7   User  Computer               Fail      0

CodePudding user response：

First create a boolean mask to find the rows where the value is Success, then group this mask by itself along with User and Computer columns and transform with sum to get the result

m = df['Success or Failure'].eq('Success')
df['count'] = m.groupby([df['User'], df['Computer'], m]).transform('sum')

    User  Computer Success or Failure  count
0  admin        DC            Success      2
1  admin        DC            Success      2
2  admin        DC               Fail      0
3  admin    Server            Success      1
4  admin    Server               Fail      0
5   User  Computer            Success      2
6   User  Computer            Success      2
7   User  Computer               Fail      0