Home > Net >  Pandas find and replace based on column items count
Pandas find and replace based on column items count

Time:05-24

I have a dataframe that looks like this

import pandas as pd

all_data_set = [
        ('A','Area1','AA','A B D E','A B','D E'),
        ('B','Area1','AA','A B D E','A B','D E'),
        ('C','Area2','BB','C','C','C'),
        ('E','Area1','CC','A B D E','A B','D E'),
        ('F','Area3','BB','F G','G','F')
        ]

all_df = pd.DataFrame(data = all_data_set, columns = ['Name','Area','Type','Group','AA members','CC members'])

 Name   Area Type    Group AA members CC members
0    A  Area1   AA  A B D E        A B        D E
1    B  Area1   AA  A B D E        A B        D E
2    C  Area2   BB        C          C          C
3    E  Area1   CC  A B D E        A B        D E
4    F  Area3   BB      F G          G          F

The last row (row 4) is in correct. Anything that is type BB should only have itself (F) in Group AA members CC members

So it should look like this:

4    F  Area3   BB        F          F          F

Todo this I was trying to:

  1. check when Type is BB and Length of Group is = 2 items like this:

    df = (all_data_set.loc[(all_data_set['Type']== 'BB')]['Group'].str.split().str.len() == 2)

  2. Then Iterate over every row and to find the cases like this

  3. make a new Df with all the drop rows and make the Group , AA members, CC members = Name

  4. Drop the row where that happens in all_df

  5. Merge 3. back in to all_df

Is there a better pandas way to do this?

CodePudding user response:

Try

# identify rows where Type is BB
m = all_df['Type'] == 'BB'
# for Type BB rows, replace Group, AA members and CC members values by Name
all_df.loc[m, ['Group', 'AA members', 'CC members']] = all_df.loc[m, 'Name']
print(all_df)
  Name   Area Type    Group AA members CC members
0    A  Area1   AA  A B D E        A B        D E
1    B  Area1   AA  A B D E        A B        D E
2    C  Area2   BB        C          C          C
3    E  Area1   CC  A B D E        A B        D E
4    F  Area3   BB        F          F          F

CodePudding user response:

You can try iloc and for loop.

for row in all_df.index:
    if all_df.iloc[row,2] == "BB":
        all_df.iloc[row,3:] = all_df["Name"][row]
        
all_df

  Name   Area Type    Group AA members CC members
0    A  Area1   AA  A B D E        A B        D E
1    B  Area1   AA  A B D E        A B        D E
2    C  Area2   BB        C          C          C
3    E  Area1   CC  A B D E        A B        D E
4    F  Area3   BB        F          F          F
  • Related