I want to fillna values in the 'last unique id' column based on the increment values from the previous row
**input is**
Channel last unique id
0 MYNTRA MN000351370
1 NYKAA NYK00038219
2 NYKAA NaN
3 NYKAA NaN
4 NYKAA NaN
5 NYKAA NaN
6 MYNTRA NaN
7 MYNTRA NaN
8 MYNTRA NaN
9 MYNTRA NaN
10 MYNTRA NaN
11 MYNTRA NaN
Expected output
Channel last unique id
0 MYNTRA MN000351370
1 NYKAA NYK00038219
2 NYKAA NYK00038220
3 NYKAA NYK00038221
4 NYKAA NYK00038222
5 NYKAA NYK00038223
6 MYNTRA MN000351371
7 MYNTRA MN000351372
8 MYNTRA MN000351373
9 MYNTRA MN000351374
10 MYNTRA MN000351375
11 MYNTRA MN000351376
Hope you understood the problem
CodePudding user response:
You can use groupby.cumcount
to increment the number, and add it the the number part:
g = df.groupby('Channel')
# ffill per group
# extract letter and number part
df2 = (g['last unique id'].ffill()
.str.extract(r'(\D )(\d )')
)
# convert number part to integer
# add cumcount, merge back as string
df['last unique id'] = (df2[0]
.add(df2[1].astype(int)
.add(g.cumcount())
.astype(str)
)
)
print(df)
Output:
Channel last unique id
0 MYNTRA MN351370
1 NYKAA NYK38219
2 NYKAA NYK38220
3 NYKAA NYK38221
4 NYKAA NYK38222
5 NYKAA NYK38223
6 MYNTRA MN351371
7 MYNTRA MN351372
8 MYNTRA MN351373
9 MYNTRA MN351374
10 MYNTRA MN351375
11 MYNTRA MN351376
CodePudding user response:
Example
data = {'col1': {0: 'A', 1: 'B', 2: 'A', 3: 'A', 4: 'B', 5: 'B'},
'col2': {0: 'A001', 1: 'BC020', 2: None, 3: None, 4: 'BC021', 5: None}}
df = pd.DataFrame(data)
df
col1 col2
0 A A001
1 B BC020
2 A None
3 A None
4 B BC021
5 B None
Code
df[['col3', 'col4']] = df.groupby('col1')['col2'].ffill().str.extract('(\D )(\d )')
df['col4'] = df['col4'].astype('int') df.groupby(['col1', 'col4']).cumcount()
df['col2'] = df['col2'].fillna(df['col3'] df['col4'].astype('str').str.zfill(3))
df = df.drop(['col3', 'col4'], axis=1)
result(df
):
col1 col2
0 A A001
1 B BC020
2 A A002
3 A A003
4 B BC021
5 B BC022
CodePudding user response:
Here is how you get the desired output with padding zeros to have your id always at a fixed length of 11.
df["last unique id"] = df.groupby("Channel")["last unique id"].ffill()
tmp = df["last unique id"].str.extract(r"(?P<ident>\D )(?P<num>\d )", expand=True)
tmp["add"] = df.groupby("Channel")["last unique id"].apply(
lambda x: x.eq(x.shift()).cumsum()
)
total_len_id = 11
df["last unique id"] = tmp.apply(
lambda row: row["ident"]
str(int(row["num"]) row["add"]).rjust(total_len_id - len(row["ident"]), "0"),
axis=1,
)
print(df)
Channel last unique id
0 MYNTRA MN000351370
1 NYKAA NYK00038219
2 NYKAA NYK00038220
3 NYKAA NYK00038221
4 NYKAA NYK00038222
5 NYKAA NYK00038223
6 MYNTRA MN000351371
7 MYNTRA MN000351372
8 MYNTRA MN000351373
9 MYNTRA MN000351374
10 MYNTRA MN000351375
11 MYNTRA MN000351376