Home > Blockchain >  Fill a value based on a condition in Pandas
Fill a value based on a condition in Pandas

Time:08-04

I have a data frame base with several columns and the "GroupName" column has some duplicate names (but slightly different with upper case and lower case).

The next column contains some values and those values are missing for some duplicate names. So I want to fill that column if duplicate names are similar.

Here is my dataframe:

GroupName  ticketNumber  Revenue  Expences  expect  Signed_Date  start_Date  end_Date
Arthur     AB25          6720     793       50      June 2021    2021-06-13  2021-08-13
FRANC,W                  56       18        100     May 2021     2021-06-13  2021-07-13
ARTHUR     512CD         78       38        100     May 2021     2021-06-14  2021-09-14
franc,W    AC23          34       23        NaN     June 2021    2021-06-13  2021-09-13
Francis    4             89       43        40      June 2021    2021-06-14  2021-09-14
Ross                     39       23        30      June 2021    2021-06-13  2021-09-13
RoSS                     34       72        20      June 2021    2021-06-13  2021-09-13
Greg                     89       48        40      June 2021    2021-06-16  2021-09-16
Ravi       DF4           73       20        60      April 2021   2021-06-15  2021-09-15
Will       6CS0          345      110       60      June 2021    2021-06-15  2021-09-15
WILL                     890      NaN       NaN     June 2021    2021-06-13   2021-09-13
Ben's      qw0           0        0         50      May 2021     2021-06-16  2021-09-16
BEN's                    45       16        60      June 2021    2021-06-13  2021-09-13

For example, FRANC,W, and franc,W are the same in the "GroupName" column, and franc,W has the "ticketNumber", AC23, and I also want to add that to the FRANC,W "ticketNumber".

So the data frame looks like below

GroupName  ticketNumber  Revenue  Expences  expect  Signed_Date  start_Date  end_Date
Arthur     AB25          6720     793       50      June 2021    2021-06-13  2021-08-13
FRANC,W    AC23          56       18        100     May 2021     2021-06-13  2021-07-13
ARTHUR     512CD         78       38        100     May 2021     2021-06-14  2021-09-14
franc,W    AC23          34       23        NaN     June 2021    2021-06-13  2021-09-13
Francis    4             89       43        40      June 2021    2021-06-14  2021-09-14
Ross                     39       23        30      June 2021    2021-06-13  2021-09-13
RoSS                     34       72        20      June 2021    2021-06-13  2021-09-13
Greg                     89       48        40      June 2021    2021-06-16  2021-09-16
Ravi       DF4           73       20        60      April 2021   2021-06-15  2021-09-15
Will       6CS0          345      110       60      June 2021    2021-06-15  2021-09-15
WILL       6CS0          890      NaN       NaN     June 2021    2021-06-13  2021-09-13
Ben's      qw0           0        0         50      May 2021     2021-06-16  2021-09-16
BEN's      qw0           45       16        60      June 2021    2021-06-13  2021-09-13

I tried to ffill and bfill, but it didn't work when there were several GroupName with missing "ticketNumber" values.

CodePudding user response:

Try this:

df['GroupName'] = df.GroupName.str.lower()
df = df.sort_values('GroupName')

df['ticketNumber1] = df['ticketNumber'].shift().where(((df['GroupName'].shift() == df['GroupName'])
                                                        & (df['ticketNumber']=="")), df['ticketNumber'])


df['ticketNumber'] = df['ticketNumber'].shift(-1).where(((df['GroupName'].shift(-1) == df['GroupName'])
                                                        & (df['ticketNumber']=="")), df['ticketNumber'])


print(df)

CodePudding user response:

df.assign(ticketNumber = df.groupby(df['GroupName'].str.lower())['ticketNumber'].transform('first'))
  • Related