pandas split string with $ special text style-CodePudding

I have an excel, the data has two $ , when I read it using pandas, it will convert them to a very special text style.

import pandas as pd
df = pd.DataFrame({ 'Bid-Ask':['$185.25 - $186.10','$10.85 - $11.10','$14.70 - $15.10']})

after pd.read_excel

df['Bid'] = df['Bid-Ask'].str.split('−').str[0]

above code doesn't work due to $ make my string into a special style text.the Split function doesn't work.

my expected result

CodePudding user response：

Do not split. Using str.extract is likely the most robust:

df[['Bid', 'Ask']] = df['Bid-Ask'].str.extract(r'(\d (?:\.\d )?)\D*(\d (?:\.\d )?)')

Output:

             Bid-Ask     Bid     Ask
0  $185.25 - $186.10  185.25  186.10
1    $10.85 - $11.10   10.85   11.10
2    $14.70 - $15.10   14.70   15.10

CodePudding user response：

There is a non-breaking space (\xa0) in your string. That's why the split doesn't work.

I copied the strings (of your df) one by one into an Excel file and then imported it with pd.read_excel. The column looks like this then:

repr(df['Bid-Ask'])
'0    $185.25\xa0- $186.10\n1      $10.85\xa0- $11.10\n2      $14.70\xa0- $15.10\nName: Bid-Ask, dtype: object'

Before splitting you can replace that and it'll work.

df['Bid-Ask'] = df['Bid-Ask'].astype('str').str.replace('\xa0',' ', regex=False)
df[['Bid', 'Ask']] = df['Bid-Ask'].str.replace('$','', regex=False).str.split('-',expand = True)
print(df)

             Bid-Ask      Bid      Ask
0  $185.25 - $186.10  185.25    186.10
1    $10.85 - $11.10   10.85     11.10
2    $14.70 - $15.10   14.70     15.10

CodePudding user response：

You have to use the lambda function and apply the method together to split the column values into two and slice the value

df['Bid'] = df['Bid-Ask'].apply(lambda x: x.split('-')[0].strip()[1:])
df['Ask'] = df['Bid-Ask'].apply(lambda x: x.split('-')[1].strip()[1:])

output:

      Bid-Ask       Bid      Ask
0    185.25− 186.10 185.25  186.1
1    10.85− 11.10   10.85   11.1
2    14.70− 15.10   14.70   15.1