So I have a pandas.Series as such
s = pd.Series(['1-Onboarding Retorno', '1.1-Onboarding escolha de bot',
'2-Seleciona produto', '3-Informa localizacao e cpf',
'3.1-CPF valido (V.2.0)', '3.2-Obtencao de CEP'],name = 'Steps')
0 1-Onboarding Retorno
1 1.1-Onboarding escolha de bot
2 2-Seleciona produto
3 3-Informa localizacao e cpf
4 3.1-CPF valido (V.2.0)
5 3.2-Obtencao de CEP
The idea here is to "filter" the df so I gather only the strings with the a unique number.
s = pd.Series(['1-Onboarding Retorno',
'2-Seleciona produto', '3-Informa localizacao e cpf'],name = 'Steps')
0 1-Onboarding Retorno
1 2-Seleciona produto
2 3-Informa localizacao e cpf
Name: Steps, dtype: object
Any ideas on how I could do that? I am having difficulties formulating the regex. I know I should use to formulate such filter in Pandas.
s.str.contains('',regex = True)
CodePudding user response:
We can use str.contains
here:
df_out = s[s["Steps"].str.contains(r'^\d -', regex=True)]
The resulting output data frame df_out
will contain only steps value which begin with a major version (integer) number.
CodePudding user response:
you can use this
l=[]
for i in range(len(s)):
if '.' not in s[i] :
l.append(s[i])
new_s= pd.Series(l,name = 'Steps')
out:
0 1-Onboarding Retorno
1 2-Seleciona produto
2 3-Informa localizacao e cpf