Home > Software design >  Conditional Fillna in Pandas with conditional increment from the previous value
Conditional Fillna in Pandas with conditional increment from the previous value

Time:12-03

I want to fillna values in the 'last unique id' column based on the increment values from the previous row

**input is** 
Channel last unique id
0   MYNTRA  MN000351370
1   NYKAA   NYK00038219
2   NYKAA   NaN
3   NYKAA   NaN
4   NYKAA   NaN
5   NYKAA   NaN
6   MYNTRA  NaN
7   MYNTRA  NaN
8   MYNTRA  NaN
9   MYNTRA  NaN
10  MYNTRA  NaN
11  MYNTRA  NaN

Expected output

        Channel last unique id
0   MYNTRA  MN000351370
1   NYKAA   NYK00038219
2   NYKAA   NYK00038220
3   NYKAA   NYK00038221
4   NYKAA   NYK00038222
5   NYKAA   NYK00038223
6   MYNTRA  MN000351371
7   MYNTRA  MN000351372
8   MYNTRA  MN000351373
9   MYNTRA  MN000351374
10  MYNTRA  MN000351375
11  MYNTRA  MN000351376


Hope you understood the problem

CodePudding user response:

You can use groupby.cumcount to increment the number, and add it the the number part:

g = df.groupby('Channel')

# ffill per group
# extract letter and number part
df2 = (g['last unique id'].ffill()
       .str.extract(r'(\D )(\d )')
       )

# convert number part to integer
# add cumcount, merge back as string
df['last unique id'] = (df2[0]
 .add(df2[1].astype(int)
            .add(g.cumcount())
            .astype(str)
      )
 )

print(df)

Output:

   Channel last unique id
0   MYNTRA       MN351370
1    NYKAA       NYK38219
2    NYKAA       NYK38220
3    NYKAA       NYK38221
4    NYKAA       NYK38222
5    NYKAA       NYK38223
6   MYNTRA       MN351371
7   MYNTRA       MN351372
8   MYNTRA       MN351373
9   MYNTRA       MN351374
10  MYNTRA       MN351375
11  MYNTRA       MN351376

CodePudding user response:

Example

data = {'col1': {0: 'A', 1: 'B', 2: 'A', 3: 'A', 4: 'B', 5: 'B'},
 'col2': {0: 'A001', 1: 'BC020', 2: None, 3: None, 4: 'BC021', 5: None}}
df = pd.DataFrame(data)

df

   col1 col2
0   A   A001
1   B   BC020
2   A   None
3   A   None
4   B   BC021
5   B   None

Code

df[['col3', 'col4']] = df.groupby('col1')['col2'].ffill().str.extract('(\D )(\d )')
df['col4'] = df['col4'].astype('int')   df.groupby(['col1', 'col4']).cumcount()
df['col2'] = df['col2'].fillna(df['col3']   df['col4'].astype('str').str.zfill(3))
df = df.drop(['col3', 'col4'], axis=1)

result(df):

   col1 col2
0   A   A001
1   B   BC020
2   A   A002
3   A   A003
4   B   BC021
5   B   BC022

CodePudding user response:

Here is how you get the desired output with padding zeros to have your id always at a fixed length of 11.

df["last unique id"] = df.groupby("Channel")["last unique id"].ffill()

tmp = df["last unique id"].str.extract(r"(?P<ident>\D )(?P<num>\d )", expand=True)
tmp["add"] = df.groupby("Channel")["last unique id"].apply(
    lambda x: x.eq(x.shift()).cumsum()
)

total_len_id = 11
df["last unique id"] = tmp.apply(
    lambda row: row["ident"]
      str(int(row["num"])   row["add"]).rjust(total_len_id - len(row["ident"]), "0"),
    axis=1,
)

print(df)
   Channel last unique id
0   MYNTRA    MN000351370
1    NYKAA    NYK00038219
2    NYKAA    NYK00038220
3    NYKAA    NYK00038221
4    NYKAA    NYK00038222
5    NYKAA    NYK00038223
6   MYNTRA    MN000351371
7   MYNTRA    MN000351372
8   MYNTRA    MN000351373
9   MYNTRA    MN000351374
10  MYNTRA    MN000351375
11  MYNTRA    MN000351376
  • Related