I want to split a column. If it has a letter (any letter) at the end, this will be the value for the second column. Otherwise, the second column should be null
import pandas as pd
data = pd.DataFrame({"data": ["0.00I", "0.01E", "99.99", "0.14F"]})
desired result:
a b
0 0.00 I
1 0.01 E
2 99.99 None
3 0.14 F
CodePudding user response:
You can use str.extract
with the (\d (?:\.\d )?)(\D)?
regex:
out = data['data'].str.extract(r'(\d (?:\.\d )?)(\D)?').set_axis(['a', 'b'], axis=1)
Or, if you want to remove the original 'data' column while adding new columns in place:
data[['a', 'b']] = data.pop('data').str.extract('(\d (?:\.\d )?)(\D)?')
output:
a b
0 0.00 I
1 0.01 E
2 99.99 NaN
3 0.14 F
(\d (?:\.\d )?) # capture a number (with optional decimal)
(\D)? # optionally capture a non-digit
CodePudding user response:
try using a regular expression and apply
def find_numeric(x):
if re.search("[[0-9] \.[0-9] ", x):
return re.search("[0-9] \.[0-9] ", x).group()
else:
return ""
def find_alpha(x):
if re.search("[a-zA-Z] ", x):
return re.search("[a-zA-Z] ", x).group()
else:
return "None"
data = pd.DataFrame({"data": ["0.00I", "0.01E", "99.99", "0.14F"]})
data["numeric"] = data["data"].apply(find_numeric)
data["alpha"]= data["data"].apply(find_alpha)
print(data)
output:
data numeric alpha
0 0.00I 0.00 I
1 0.01E 0.01 E
2 99.99 99.99 None
3 0.14F 0.14 F