Regex that captures and filters the "steps" strings that have only one sole number at the-CodePudding

So I have a pandas.Series as such

s = pd.Series(['1-Onboarding   Retorno', '1.1-Onboarding escolha de bot',
                  '2-Seleciona produto', '3-Informa localizacao e cpf',
                  '3.1-CPF valido (V.2.0)', '3.2-Obtencao de CEP'],name = 'Steps')

0           1-Onboarding   Retorno
1    1.1-Onboarding escolha de bot
2              2-Seleciona produto
3      3-Informa localizacao e cpf
4           3.1-CPF valido (V.2.0)
5              3.2-Obtencao de CEP

The idea here is to "filter" the df so I gather only the strings with the a unique number.

s = pd.Series(['1-Onboarding   Retorno',
                  '2-Seleciona produto', '3-Informa localizacao e cpf'],name = 'Steps')

0         1-Onboarding   Retorno
1            2-Seleciona produto
2    3-Informa localizacao e cpf
Name: Steps, dtype: object

Any ideas on how I could do that? I am having difficulties formulating the regex. I know I should use to formulate such filter in Pandas.

s.str.contains('',regex = True)

CodePudding user response：

We can use str.contains here:

df_out = s[s["Steps"].str.contains(r'^\d -', regex=True)]

The resulting output data frame df_out will contain only steps value which begin with a major version (integer) number.

CodePudding user response：

you can use this

l=[]
for i in range(len(s)):
        if '.' not in s[i] :
            l.append(s[i])
new_s= pd.Series(l,name = 'Steps')

out:

0         1-Onboarding   Retorno
1            2-Seleciona produto
2    3-Informa localizacao e cpf