I want to change value of my duplicate. Here is my dataframe:
sku
FAT-001
FAT-001
FAT-001
FAT-002
FAT-002
my expected data frame will be look like this
sku
FAT-001 #don't want to change first value of duplicate
FAT-001-01
FAT-001-02
FAT-002
FAT-002-01
CodePudding user response:
df["number"] = df.groupby("sku").cumcount()
df.apply(lambda x: x.sku ("" if x.number == 0 else "-" str(x.number).rjust(2,"0")), axis = 1)
or:
df["number"] = "-" df.groupby("sku").cumcount().astype(str).str.rjust(2, '0')
df.number[df["number"].eq("-" "0" * 2)] = ""
df.sku df.number
my output:
0 FAT-001
1 FAT-001-01
2 FAT-001-02
3 FAT-002
4 FAT-002-01
dtype: object
explanation:
what is groupby
? it is a sql-inspired
command that give you some element base on every unique element... for example: length, max, list or other...
df = pd.DataFrame([
[1,2],
[1,3],
[1,4],
[2,5],
[2,6],
], columns=["id","number"])
df.groupby("id").agg({"number": len})
give you:
number
id
1 3
2 2
the number of every unique elements, or
df.groupby("id").agg({"number": list})
give you
number
id
1 [2, 3, 4]
2 [5, 6]
and you can try max
or min
or first
...
using agg
, you can specify what you want for every column... if you have more than one (except what was groupby
), you can specify something for each column
...
except agg
, there are other methods for goupped
dataframes: like cumcount
, that set index every row
for each group
, I mean it reset index
for every group
:
df.groupby("sku").cumcount()
output:
0 0
1 1
2 2
3 0
4 1
your first FAT-001
get index
: 0, next: 1, ... and for FAT-002
, first get index 0 again...
so, we have two part that you want, now... therefore we must find a way to join them for every row
: axis:1
in apply
means for every row
so, you have one exception: you don't want index
for every row of each group... so change it to "", empty:
df.apply(lambda x: "" if x.number == 0 else str(x.number), axis = 1)
:
0
1 1
2 2
3
4 1
dtype: object
your 0
and 3
rows are for new-group...
next, your desired format is: 01,02,... a 0
for every index. pandas has a method that convert every string to string with desired length with an arbitrary char:
rjust(desird_length, arbitrary_char)
how works: if you call that as rjust(2,"0")
, it dos not change "22"
or other 2-char, 3-char, ... strings, but, if your string length be 1 like 1
will be converted to 01
and ... (notice that there are a method called ljust
too :))
df["number"] = df.groupby("sku").cumcount()
df.apply(lambda x: "" if x.number == 0 else str(x.number).rjust(2,"0"), axis = 1)
0
1 01
2 02
3
4 01
dtype: object
and if
statement can written as:
if x.number == 0:
return ""
else:
return "-" str(x.number).rjust(2,"0")
and some points:
- what is
astype(str)
: it convert every element to string, and works likestr(x)
, but for every element. why? to add "-" before it and usingrjust
. - what is
eq
? it isis-equal?
and returnTrue
for each row if be equal to value, andFalse
otherwise. - why
df.number[df["number"].eq("-" "0" * 2)] = ""
? because we convert all first elements of each group to""
- why
"-" "0" * 2
? because we add"-"
and ljust in previous line so we must use correct value:"-00"
. and why"0" * 2
? because you can use every number forljust length
like 10 and set it there, too
CodePudding user response:
Similar to @MoRe's answer, use groupby.cumcount
to create groups; then you could use str.zfill
to pad 0s and mask
the first elements of each group:
groups = df.groupby('sku').cumcount()
df['new'] = df['sku'] ('-' groups.astype('string').str.zfill(2)).mask(groups.eq(0), '')
Output:
sku new
0 FAT-001 FAT-001
1 FAT-001 FAT-001-01
2 FAT-001 FAT-001-02
3 FAT-002 FAT-002
4 FAT-002 FAT-002-01
CodePudding user response:
For an alternative one-liner:
df.sku = df.sku df.groupby('sku').cumcount().apply(lambda x: f"-{x:02d}" if x > 0 else '')
Output:
sku
0 FAT-001
1 FAT-001-01
2 FAT-001-02
3 FAT-002
4 FAT-002-01