I have the below pandas dataframe.
d = {'col1': [1, 2,3,4,5,60,0,0,6,3,2,4],'col3': [1, 22,33,44,55,60,1,5,6,3,2,4],'Name': ['22a| df a1asd_V1', 'xcd a2a_sd_V3','23vg aa_bsd_V1','22a| df a1asd_V1|5mo','a3as d_V1','aa b_12mo','aasd_V4','aa_6mo_bsd','aa_adn sd_V15',np.nan,'aasd_V12','aasd120Abs'],'Date': ['2021-06-13', '2021-06-13','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16']}
dff = pd.DataFrame(data=d)
dff
col1 col3 Name Date
0 1 1 22a| df a1asd_V1 2021-06-13
1 2 22 xcd a2a_sd_V3 2021-06-13
2 3 33 23vg aa_bsd_V1 2021-06-13
3 4 44 22a| df a1asd_V1|5mo 2021-06-14
4 5 55 a3as d_V1 2021-06-15
5 60 60 aa b_12mo 2021-06-15
6 0 1 aasd_V4 2021-06-13
7 0 5 aa_6mo_bsd 2021-06-16
8 6 6 aa_adn sd_V15 2021-06-13
9 3 3 NaN 2021-06-13
10 2 2 aasd_V12 2021-06-13
11 4 4 aasd120Abs 2021-06-16
I want to replace _, | into space and if there is like 5mo, 6mo, 12mo.. into 5 months, 6 months, 12, months like that for Name column and make a new column called New Name. Like below data frame.
col1 col3 Name Date NewName
0 1 1 22a| df a1asd_V1 2021-06-13 22a df a1asd V1
1 2 22 xcd a2a_sd_V3 2021-06-13 xcd a2a sd V3
2 3 33 23vg aa_bsd_V1 2021-06-13 23vg aa bsd V1
3 4 44 22a| df a1asd_V1|5mo 2021-06-14 22a df a1asd V1 5 months
4 5 55 a3as d_V1 2021-06-15 a3as d V1
5 60 60 aa b_12mo 2021-06-15 aa b 12 months
6 0 1 aasd_V4 2021-06-13 aasd V4
7 0 5 aa_6mo_bsd 2021-06-16 aa 6 months bsd
8 6 6 aa_adn sd_V15 2021-06-13 aa adn sd V15
9 3 3 NaN 2021-06-13 NaN
10 2 2 aasd_V12 2021-06-13 aasd V12
11 4 4 aasd120Abs 2021-06-16 aasd120Abs
Is it possible to do it in the lambda function? Since my actual data frame has more than 1million records I need something much efficient to work.
Thanks in advance! Any idea would be appriciate.
CodePudding user response:
This should work:
dff["NewName"] = dff["Name"].apply(lambda x: x.replace("|"," ").replace("_"," "))
CodePudding user response:
You can use pd.Series.replace
:
print (df["Name"].replace({"[|_]":" ", "(\d )mo":"\\1 months"}, regex=True))
0 22a df a1asd V1
1 xcd a2a sd V3
2 23vg aa bsd V1
3 22a df a1asd V1 5 months
4 a3as d V1
5 aa b 12 months
6 aasd V4
7 aa 6 months bsd
8 aa adn sd V15
9 NaN
10 aasd V12
11 aasd120Abs
Name: Name, dtype: object
CodePudding user response:
df['NewName'] = [x.replace('|', '').replace('_', '') for x in df['Name']]