Home > other >  Pandas split column if condition, else null
Pandas split column if condition, else null

Time:10-27

I want to split a column. If it has a letter (any letter) at the end, this will be the value for the second column. Otherwise, the second column should be null

import pandas as pd
data = pd.DataFrame({"data": ["0.00I", "0.01E", "99.99", "0.14F"]})

desired result:

    a    b
0 0.00   I
1 0.01   E
2 99.99  None
3 0.14   F

CodePudding user response:

You can use str.extract with the (\d (?:\.\d )?)(\D)? regex:

out = data['data'].str.extract(r'(\d (?:\.\d )?)(\D)?').set_axis(['a', 'b'], axis=1)

Or, if you want to remove the original 'data' column while adding new columns in place:

data[['a', 'b']] = data.pop('data').str.extract('(\d (?:\.\d )?)(\D)?')

output:

       a    b
0   0.00    I
1   0.01    E
2  99.99  NaN
3   0.14    F

regex demo

(\d (?:\.\d )?)  # capture a number (with optional decimal)
(\D)?            # optionally capture a non-digit

CodePudding user response:

try using a regular expression and apply

def find_numeric(x):
    if re.search("[[0-9] \.[0-9] ", x):
        return re.search("[0-9] \.[0-9] ", x).group()
    else:
        return ""

def find_alpha(x):
    if re.search("[a-zA-Z] ", x):
        return re.search("[a-zA-Z] ", x).group()
    else:
        return "None"
    
data = pd.DataFrame({"data": ["0.00I", "0.01E", "99.99", "0.14F"]})    

data["numeric"] = data["data"].apply(find_numeric)

data["alpha"]= data["data"].apply(find_alpha)

print(data)

output:

data numeric alpha
0  0.00I    0.00     I
1  0.01E    0.01     E
2  99.99   99.99  None
3  0.14F    0.14     F

  • Related