I have the following dataframe:
contract
0 Future(conId=482048803, symbol='ESTX50', lastT...
1 Future(conId=497000453, symbol='XT', lastTrade...
2 Stock(conId=321100413, symbol='SXRS', exchange...
3 Stock(conId=473087271, symbol='ETHEEUR', excha...
4 Stock(conId=80268543, symbol='IJPA', exchange=...
5 Stock(conId=153454120, symbol='EMIM', exchange...
6 Stock(conId=75776072, symbol='SXR8', exchange=...
7 Stock(conId=257200855, symbol='EGLN', exchange...
8 Stock(conId=464974581, symbol='VBTC', exchange...
9 Future(conId=478135706, symbol='ZN', lastTrade...
I want to create a new "symbol" column with all symbols (ESTX50, XT, SXRS...).
In order to extract the substring between "symbol='" and the following single quote, I tried the following:
df['symbol'] = df.contract.str.extract(r"symbol='(.*?)'")
but I get a column of NaN. What am I doing wrong? Thanks
CodePudding user response:
It looks like that is a column of objects, not strings:
import pandas as pd
class Future:
def __init__(self, symbol):
self.symbol = symbol
def __repr__(self):
return f'Future(symbol=\'{self.symbol}\')'
df = pd.DataFrame({'contract': [Future(symbol='ESTX50'), Future(symbol='XT')]})
df['symbol'] = df.contract.str.extract(r"symbol='(.*?)'")
print(df)
df
:
contract symbol
0 Future(symbol='ESTX50') NaN
1 Future(symbol='XT') NaN
Notice pandas considers strings to be object
type so the string accessor is still allowed to attempt to perform operations. However, it cannot extract because these are not strings.
We can either convert to string first with astype
:
df['symbol'] = df.contract.astype(str).str.extract(r"symbol='(.*?)'")
df
:
contract symbol
0 Future(symbol='ESTX50') ESTX50
1 Future(symbol='XT') XT
However, the faster approach is to try to extract the object property:
df['symbol'] = [getattr(x, 'symbol', None) for x in df.contract]
Or with apply
(which can be slower than the comprehension)
df['symbol'] = df.contract.apply(lambda x: getattr(x, 'symbol', None))
Both produce:
contract symbol
0 Future(symbol='ESTX50') ESTX50
1 Future(symbol='XT') XT