Home > database >  Pandas: Replace the first row of each group with second row if meet specific condition
Pandas: Replace the first row of each group with second row if meet specific condition

Time:08-03

I want to filter all qualified boss in the dataset. The rule is, each group has one boss, if the status of the boss is "Active" then its valid, if the statue of boss is "Inactive" then we need to replace it using the second row of that group.

For example, in row 1 group A, Allen is Boss but Inactive, then the second row of group A need to replace the first row. In group B, Kim is Boss and Active, so its valid.

Group Name Title Status
A Allen Boss Inactive
A Lisa n Active
A Louis n Active
B Kim Boss Active
B Derek n Active
B Andrew n Active
B Katie n Active
C Lux Boss Inactive

In the end, all inactive Boss row should be filtered out, end result should be like below.

Group Name Title Status
A Lisa n Active
B Kim Boss Active

Thanks in advance!

CodePudding user response:

How's something like this?

df[df.Status.eq("Active")].drop_duplicates(subset=["Group"], ignore_index=True)

Output:

    Group   Name    Title       Status
0   A       Lisa    n           Active
1   B       Kim     Boss        Active

Stepping through it:

  • df[df.Status.eq("Active")] grabs only the rows where "Status" is "Active"
  • drop_duplicates(subset=["Group"] drops all rows after the first occurrence of a new value in "Group" ... e.g returns the first row with group A, then the first row with Group B, etc
  • ignore_index=True) ignores the above rows' index and resets the index to start back at 0. Without this the index would be [1, 3]

CodePudding user response:

Drop the Inactive rows, and then take the first line from each group.

df[df.Status.ne('Inactive')].groupby('Group', as_index=False).first()

Output:

  Group  Name Title  Status
0     A  Lisa     n  Active
1     B   Kim  Boss  Active

CodePudding user response:

solustion

import pandas as pd
data=pd.read_csv("./level.csv")
# print(data)
group_A=data[data["Group"]=="A"]
group_B=data[data["Group"]=="B"]
group_C=data[data["Group"]=="C"]
for (key,value) in group_A.iterrows():
    if value["Status"]=="Active":
        value["Title"]="Boss"
        break
print(group_A)

output

   Group   Name Title    Status
0     A  Allen  Boss  Inactive
1     A   Lisa  Boss    Active
2     A  Louis     n    Active

CodePudding user response:

Extract rows with specific property

import pandas
with open("./group.csv") as group:
    data = pandas.read_csv(group)
# get a row data  having Boss title
    row_boss = data[data.Title == "Boss"]
    print(row_boss)
# get a row data  having Active status
    row_active = data[data["Status"] == "Active"]
    print(row_active)

no difference each of other ▶ data["Status"] = data.Status

  • Related