I have a data frame base with several columns and the "GroupName" column has some duplicate names (but slightly different with upper case and lower case).
The next column contains some values and those values are missing for some duplicate names. So I want to fill that column if duplicate names are similar.
Here is my dataframe:
GroupName ticketNumber Revenue Expences expect Signed_Date start_Date end_Date
Arthur AB25 6720 793 50 June 2021 2021-06-13 2021-08-13
FRANC,W 56 18 100 May 2021 2021-06-13 2021-07-13
ARTHUR 512CD 78 38 100 May 2021 2021-06-14 2021-09-14
franc,W AC23 34 23 NaN June 2021 2021-06-13 2021-09-13
Francis 4 89 43 40 June 2021 2021-06-14 2021-09-14
Ross 39 23 30 June 2021 2021-06-13 2021-09-13
RoSS 34 72 20 June 2021 2021-06-13 2021-09-13
Greg 89 48 40 June 2021 2021-06-16 2021-09-16
Ravi DF4 73 20 60 April 2021 2021-06-15 2021-09-15
Will 6CS0 345 110 60 June 2021 2021-06-15 2021-09-15
WILL 890 NaN NaN June 2021 2021-06-13 2021-09-13
Ben's qw0 0 0 50 May 2021 2021-06-16 2021-09-16
BEN's 45 16 60 June 2021 2021-06-13 2021-09-13
For example, FRANC,W, and franc,W are the same in the "GroupName" column, and franc,W has the "ticketNumber", AC23, and I also want to add that to the FRANC,W "ticketNumber".
So the data frame looks like below
GroupName ticketNumber Revenue Expences expect Signed_Date start_Date end_Date
Arthur AB25 6720 793 50 June 2021 2021-06-13 2021-08-13
FRANC,W AC23 56 18 100 May 2021 2021-06-13 2021-07-13
ARTHUR 512CD 78 38 100 May 2021 2021-06-14 2021-09-14
franc,W AC23 34 23 NaN June 2021 2021-06-13 2021-09-13
Francis 4 89 43 40 June 2021 2021-06-14 2021-09-14
Ross 39 23 30 June 2021 2021-06-13 2021-09-13
RoSS 34 72 20 June 2021 2021-06-13 2021-09-13
Greg 89 48 40 June 2021 2021-06-16 2021-09-16
Ravi DF4 73 20 60 April 2021 2021-06-15 2021-09-15
Will 6CS0 345 110 60 June 2021 2021-06-15 2021-09-15
WILL 6CS0 890 NaN NaN June 2021 2021-06-13 2021-09-13
Ben's qw0 0 0 50 May 2021 2021-06-16 2021-09-16
BEN's qw0 45 16 60 June 2021 2021-06-13 2021-09-13
I tried to ffill and bfill, but it didn't work when there were several GroupName with missing "ticketNumber" values.
CodePudding user response:
Try this:
df['GroupName'] = df.GroupName.str.lower()
df = df.sort_values('GroupName')
df['ticketNumber1] = df['ticketNumber'].shift().where(((df['GroupName'].shift() == df['GroupName'])
& (df['ticketNumber']=="")), df['ticketNumber'])
df['ticketNumber'] = df['ticketNumber'].shift(-1).where(((df['GroupName'].shift(-1) == df['GroupName'])
& (df['ticketNumber']=="")), df['ticketNumber'])
print(df)
CodePudding user response:
df.assign(ticketNumber = df.groupby(df['GroupName'].str.lower())['ticketNumber'].transform('first'))