Home > OS >  Modify column in data frame
Modify column in data frame

Time:02-21

My dataset contains categorical values stored as numbers. Below you can see how is look like my data.

data = {'id':['1','2','3','4','5'],
       'nc':['01', '1.0', '2.0', '13.0',"B"]}
df = pd.DataFrame(data, columns = ['id','nc'])

enter image description here

Now I want to join this table with a table who have categorical data (e.g 01,02,03 etc.) but I can't because values are not the same.

So can anybody help me how to modify this column or in other words separate values by "." and adding zero for numbers below 10, like the table below:

enter image description here

I try with this line of code but is not working

df["nc1"]= df['nc'].str.split(".", expand = True) 

CodePudding user response:

We can try with Series.str.len and Series.str.rstrp

df['nc'] = df['nc'].str.rstrip('.0')
df.loc[df['nc'].str.len().le(1) & df['nc'].str.isnumeric(), 'nc'] = '0'   df['nc']
print(df)

  id  nc
0  1  01
1  2  01
2  3  02
3  4  13
4  5   B

CodePudding user response:

The logic is not fully clear, by changing 2.0 numbers into 02 is quite easy with a regex:

df['nc'] = df['nc'].str.replace(r'^(\d )(\.0 )$', lambda m: m.group(1).zfill(2))

output (here as a new column nc2 for clarity):

  id    nc nc2
0  1    01  01
1  2   1.0  01
2  3   2.0  02
3  4  13.0  13
4  5     B   B
  • Related