In Pandas get a part of String-CodePudding

I have the below pandas dataframe.

d = {'col1': [1, 2,3,4,5,60,0,0,6,3,2,4],'col3': [1, 22,33,44,55,60,1,5,6,3,2,4],'Name': ['22a| df a1asd_V1|5P|hey-910929|abc-939090', 'xcd a2asd_V3','23vg aabsd_V1','22a| df a1asd_V1|5P|hey-910929|abc-939090','a3as  d_V1','aa bsd_V3','aasd_V4','aabsd_V4','aa_adn sd_V15',np.nan,'aasd_V12','aasd120Abs'],'Date': ['2021-06-13', '2021-06-13','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16']}
dff = pd.DataFrame(data=d)
dff
    col1    col3    Name                                        Date       
0   1       1       22a| df a1asd_V1|5P|hey-910929|abc-939090   2021-06-13 
1   2       22      xcd a2asd_V3                                2021-06-13 
2   3       33      23vg aabsd_V1                               2021-06-13 
3   4       44      dfgdf|_aabsd_V0|2P|hyy-510929|nfc-239090    2021-06-14 
4   5       55      a3as d_V1                                   2021-06-15 
5   60      60      aa bsd_V3                                   2021-06-15 
6   0       1       aasd_V4                                     2021-06-13 
7   0       5       aabsd_V4                                    2021-06-16 
8   6       6       aa_adn sd_V10                               2021-06-13 
9   3       3       NaN                                         2021-06-13 
10  2       2       aasd_V12                                    2021-06-13 
11  4       4       aasd120Abs                                  2021-06-16

I want to make two new columns based on the Name column. I want to extract the part of the string in the Name column like V1, V2, V3, V4...V20 like that. Also if there isn't anything like that at end of the Name string or if the Name row is empty, just want to make an empty cell. So I want something like the below pandas dataframe.

    col1 col3       Name                                        Date  Version  Version
0   1     1     2a| df a1asd_V1|5P|hey-910929|abc-939090    2021-06-13   V1    Version 1
1   2     22    xcd a2asd_V3                                2021-06-13   V3    Version 3
2   3     33    23vg aabsd_V1                               2021-06-13   V1    Version 1
3   4     44    dfgdf|_aabsd_V0|2P|hyy-510929|nfc-239090    2021-06-14   V0    Version 0
4   5     55    a3as d_V1                                   2021-06-15   V1    Version 1
5   60    60    aa bsd_V3                                   2021-06-15   V3    Version 3
6   0     1     aasd_V4                                     2021-06-13   V4    Version 4
7   0     5     aabsd_V4                                    2021-06-16   V4    Version 4
8   6     6     aa_adn sd_V10                               2021-06-13   V10   Version 10
9   3     3     NaN                                         2021-06-13         
10  2     2     aasd_V12                                    2021-06-13   V12   Version 12
11  4     4     aasd120Abs                                  2021-06-16

Is it possible to do that? I know in SQL we can do that using "LIKE"

WHEN `Name` LIKE '%V10%' THEN 'Verison 10'.

Is there a similar command or any other way to do that in python? I try to use "dff['Name'].str.extract" but didn't get quite right Thanks in advance! Any help is appreciated!

CodePudding user response：

You can use regex to extract the V with any following numbers:

import pandas as pd

d = {'col1': [1, 2,3,4,5,60,0,0,6,3,2,4],'col3': [1, 22,33,44,55,60,1,5,6,3,2,4],'Name': ['2a df a1asd_V1', 'xcd a2asd_V3','23vg aabsd_V1','dfgdf_aabsd_V0','a3as  d_V1','aa bsd_V3','aasd_V4','aabsd_V4','aa_adn sd_V15',np.nan,'aasd_V12','aasd120Abs'],'Date': ['2021-06-13', '2021-06-13','2021-06-13','2021-06-14','2021-06-15','2021-06-15','2021-06-13','2021-06-16','2021-06-13','2021-06-13','2021-06-13','2021-06-16']}
dff = pd.DataFrame(data=d)
dff['Version'] = dff['Name'].str.extract(r'(V\d )')
dff['Version2'] = 'Version '   dff['Version'].str[1:]

Output:

	col1	col3	Name	Date	Version	Version2
0	1	1	2a df a1asd_V1	2021-06-13	V1	Version 1
1	2	22	xcd a2asd_V3	2021-06-13	V3	Version 3
2	3	33	23vg aabsd_V1	2021-06-13	V1	Version 1
3	4	44	dfgdf_aabsd_V0	2021-06-14	V0	Version 0
4	5	55	a3as d_V1	2021-06-15	V1	Version 1
5	60	60	aa bsd_V3	2021-06-15	V3	Version 3
6	0	1	aasd_V4	2021-06-13	V4	Version 4
7	0	5	aabsd_V4	2021-06-16	V4	Version 4
8	6	6	aa_adn sd_V15	2021-06-13	V15	Version 15
9	3	3	nan	2021-06-13	nan	nan
10	2	2	aasd_V12	2021-06-13	V12	Version 12
11	4	4	aasd120Abs	2021-06-16	nan	nan