I am looking to replace the value in a column depending on whether it starts with a number or not. The loop below returns NaNs for each of the values.
My desired output is:
novalue (First 6 digits)
novalue
one (Extract what is in parentheses)
two
df = pd.DataFrame({'VALUE': ['novalue1', 'novalue2', '22n(one)', '22n(two)',
'completed', 'none'],})
import re
for row in df.iterrows():
try:
value=row[1].VALUE
if re.search("^[a-zA-Z]", value) is not None:
df['VALUE'] = df['VALUE'].str[:6]
else:
df['VALUE'] = df['VALUE'].str.extract(r'\((.*)\)', expand=False)
except :
print(value, ' : Unsuccessful')
df
CodePudding user response:
Problem is that df['VALUE']
in the loop means the whole column but you only want change that exact row. You can try np.where
instead.
df['out'] = np.where(df['VALUE'].str[0].str.isalpha(), df['VALUE'].str[:7], df['VALUE'].str.extract('\((.*)\)')[0])
print(df)
VALUE out
0 novalue1 novalue
1 novalue2 novalue
2 22n(one) one
3 22n(two) two
4 completed complet
5 none none