How to create a column based on the value of the other columns-CodePudding

I have the following dataframe

    type_x  Range  myValname
0     g1   0.48        600
1     g2   0.30        600
2     g3   0.62        890
3     g4   0.75        890

I would like to get the following dataframe

    type_x  Range  myValname newCol
0     g1   0.48        600    c1
1     g2   0.30        600    c1
2     g3   0.62        890    c2
3     g4   0.75        890    c2

The significance of c1 and c2 are that if the myValname is same for a type_x value then both the value can be treated as same value. I want generalized code.

My thinking is to convert it into dictionary and map some values, but unable to get the outcome.

 df3['newCol'] = df3.groupby('myValname').rank()

CodePudding user response：

 df["newCol"] = df.groupby("myValname").ngroup().add(1).astype(str).radd("c")

for each unique "myValname", take the group order of it (0, 1, ...)
since it's 0-based, add(1) to get 1, 2, ... instead
then stringify it to add from right "c"

to get

>>> df
  type_x  Range  myValname newCol
0     g1   0.48        600     c1
1     g2   0.30        600     c1
2     g3   0.62        890     c2
3     g4   0.75        890     c2

where after .ngroup(), this was here:

>>> df.groupby("myValname").ngroup()

0    0
1    0
2    1
3    1
dtype: int64

alternative with pd.factorize:

df["newCol"] = pd.Series(pd.factorize(df["myValname"])[0]   1, dtype="str").radd("c")

where now pd.factorize assigns 0, 1, ... to each unique value in "myValname", and after that the same modifications follow as before.

CodePudding user response：

You can add/append a new column to the DataFrame based on the values of another column using df. assign() , df. apply() , and, np. where() functions and return a new Dataframe after adding a new column.