I have the following dataframe
type_x Range myValname
0 g1 0.48 600
1 g2 0.30 600
2 g3 0.62 890
3 g4 0.75 890
I would like to get the following dataframe
type_x Range myValname newCol
0 g1 0.48 600 c1
1 g2 0.30 600 c1
2 g3 0.62 890 c2
3 g4 0.75 890 c2
The significance of c1
and c2
are that if the myValname is same for a type_x
value then both the value can be treated as same value. I want generalized code.
My thinking is to convert it into dictionary and map some values, but unable to get the outcome.
df3['newCol'] = df3.groupby('myValname').rank()
CodePudding user response:
df["newCol"] = df.groupby("myValname").ngroup().add(1).astype(str).radd("c")
- for each unique "myValname", take the group order of it (0, 1, ...)
- since it's 0-based, add(1) to get 1, 2, ... instead
- then stringify it to add from right "c"
to get
>>> df
type_x Range myValname newCol
0 g1 0.48 600 c1
1 g2 0.30 600 c1
2 g3 0.62 890 c2
3 g4 0.75 890 c2
where after .ngroup()
, this was here:
>>> df.groupby("myValname").ngroup()
0 0
1 0
2 1
3 1
dtype: int64
alternative with pd.factorize:
df["newCol"] = pd.Series(pd.factorize(df["myValname"])[0] 1, dtype="str").radd("c")
where now pd.factorize assigns 0, 1, ... to each unique value in "myValname", and after that the same modifications follow as before.
CodePudding user response:
You can add/append a new column to the DataFrame based on the values of another column using df. assign() , df. apply() , and, np. where() functions and return a new Dataframe after adding a new column.