I have the below pandas dataframe.
d = {'col1': [1, 2,3,4,5,60,0,0,6,3,2,4],'col3': [1, 22,33,44,55,60,1,5,6,3,2,4],'Name': ['2a df a1asd_V1', 'xcd a2asd_V3','23vg aabsd_V1','dfgdf_aabsd_V0','a3as d_V1','aa bsd_V3','aasd_V4','aabsd_V4','aa_adn sd_V15',np.nan,'aasd_V12','aasd120Abs'],'Date': ['2021-06-13', '2021-06-13','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16']}
dff = pd.DataFrame(data=d)
dff
col1 col3 Name Date
0 1 1 2a df a1asd_V1 2021-06-13
1 2 22 xcd a2asd_V3 2021-06-13
2 3 33 23vg aabsd_V1 2021-06-13
3 4 44 dfgdf_aabsd_V0 2021-06-14
4 5 55 a3as d_V1 2021-06-15
5 60 60 aa bsd_V3 2021-06-15
6 0 1 aasd_V4 2021-06-13
7 0 5 aabsd_V4 2021-06-16
8 6 6 aa_adn sd_V10 2021-06-13
9 3 3 NaN 2021-06-13
10 2 2 aasd_V12 2021-06-13
11 4 4 aasd120Abs 2021-06-16
I want to make two new columns based on the Name column. I want to extract the part of the string in the Name column like V1, V2, V3, V4...V20 like that. Also if there isn't anything like that at end of the Name string or if the Name row is empty, just want to make an empty cell. So I want something like below pandas dataframe.
col1 col3 Name Date Version Version
0 1 1 2a df a1asd_V1 2021-06-13 V1 Version 1
1 2 22 xcd a2asd_V3 2021-06-13 V3 Version 3
2 3 33 23vg aabsd_V1 2021-06-13 V1 Version 1
3 4 44 dfgdf_aabsd_V0 2021-06-14 V0 Version 0
4 5 55 a3as d_V1 2021-06-15 V1 Version 1
5 60 60 aa bsd_V3 2021-06-15 V3 Version 3
6 0 1 aasd_V4 2021-06-13 V4 Version 4
7 0 5 aabsd_V4 2021-06-16 V4 Version 4
8 6 6 aa_adn sd_V10 2021-06-13 V10 Version 10
9 3 3 NaN 2021-06-13
10 2 2 aasd_V12 2021-06-13 V12 Version 12
11 4 4 aasd120Abs 2021-06-16
Is it possible to do that? I know in SQL we can do that using "LIKE"
WHEN `Name` LIKE '%V10%' THEN 'Verison 10'.
Is there a similar command or any other way to do that in python?
Thanks in advance! Any help is appreciated!
CodePudding user response:
Use str.extract
with a regex and str.replace
to rename values:
dff['Version_short'] = dff['Name'].str.extract('_(V\d )$').fillna('')
dff['Version_long'] = dff['Version_short'].str.replace('V', 'Version ')
Output:
>>> dff
col1 col3 Name Date Version_short Version_long
0 1 1 2a df a1asd_V1 2021-06-13 V1 Version 1
1 2 22 xcd a2asd_V3 2021-06-13 V3 Version 3
2 3 33 23vg aabsd_V1 2021-06-13 V1 Version 1
3 4 44 dfgdf_aabsd_V0 2021-06-14 V0 Version 0
4 5 55 a3as d_V1 2021-06-15 V1 Version 1
5 60 60 aa bsd_V3 2021-06-15 V3 Version 3
6 0 1 aasd_V4 2021-06-13 V4 Version 4
7 0 5 aabsd_V4 2021-06-16 V4 Version 4
8 6 6 aa_adn sd_V15 2021-06-13 V15 Version 15
9 3 3 NaN 2021-06-13
10 2 2 aasd_V12 2021-06-13 V12 Version 12
11 4 4 aasd120Abs 2021-06-16
CodePudding user response:
You can use str.extract
and a short regex (_(V\d )$
):
dff['Version'] = dff['Name'].str.extract('_(V\d )$')
dff['Version_long'] = 'Version ' dff['Version'].str[1:]
NB. I named the second column differently as it is problematic (although not fully impossible) to have two columns with the same name
output:
col1 col3 Name Date Version Version_long
0 1 1 2a df a1asd_V1 2021-06-13 V1 Version 1
1 2 22 xcd a2asd_V3 2021-06-13 V3 Version 3
2 3 33 23vg aabsd_V1 2021-06-13 V1 Version 1
3 4 44 dfgdf_aabsd_V0 2021-06-14 V0 Version 0
4 5 55 a3as d_V1 2021-06-15 V1 Version 1
5 60 60 aa bsd_V3 2021-06-15 V3 Version 3
6 0 1 aasd_V4 2021-06-13 V4 Version 4
7 0 5 aabsd_V4 2021-06-16 V4 Version 4
8 6 6 aa_adn sd_V15 2021-06-13 V15 Version 15
9 3 3 NaN 2021-06-13 NaN NaN
10 2 2 aasd_V12 2021-06-13 V12 Version 12
11 4 4 aasd120Abs 2021-06-16 NaN NaN
CodePudding user response:
Use str.split
and str.replace
in df.assign
in a one liner
dff = dff.assign(Version_short=dff['Name'].str.split('([A-Z0-9] $)').str[-2].fillna(''),Version_lon=dff['Name'].str.split('([A-Z0-9] $)').str[-2].replace('^\D', 'Version ', regex=True).fillna(''))
col1 col3 Name Date Version_short Version_lon
0 1 1 2a df a1asd_V1 2021-06-13 V1 Version 1
1 2 22 xcd a2asd_V3 2021-06-13 V3 Version 3
2 3 33 23vg aabsd_V1 2021-06-13 V1 Version 1
3 4 44 dfgdf_aabsd_V0 2021-06-14 V0 Version 0
4 5 55 a3as d_V1 2021-06-15 V1 Version 1
5 60 60 aa bsd_V3 2021-06-15 V3 Version 3
6 0 1 aasd_V4 2021-06-13 V4 Version 4
7 0 5 aabsd_V4 2021-06-16 V4 Version 4
8 6 6 aa_adn sd_V15 2021-06-13 V15 Version 15
9 3 3 NaN 2021-06-13
10 2 2 aasd_V12 2021-06-13 V12 Version 12
11 4 4 aasd120Abs 2021-06-16