I'm using Python for some projects, but I still have no idea how a library like pandas
can redefine the core syntax of the Python language.
This code generates no error, but whatever I try to implement (outside of pandas library) nothing will be accepted and executed by the runtime, unlike this pandas code:
df[['col_x', 'col_y']]
# or
df.loc['col_x']
I could try to implement similar syntax, e.g.: I can use a function to pass or return a data structure similarly like this:
df([['col_x', 'col_y']])
# or
df = [['col_x', 'col_y']]
# or
df.loc()['col_x']
# or
df.loc = ['col_x']
but it's not close to the syntax of the pandas library.
But if I leave out the parentheses ()
it will generate a syntax error.
- How does the pandas library implement this custom syntax?
- Could I define any custom syntax for a new library, e.g.
df[[ ]]
? - Can someone explain this with simple Python concepts?
CodePudding user response:
The following is an example class which implements __getitem__
and a loc
property:
class Test:
def __getitem__(self, index):
if isinstance(index, list):
return 'DataFrame'
else:
return 'Series'
@property
def loc(self):
return self
test = Test()
print(test['foo']) # Series
print(test[['foo']]) # DataFrame
print(test.loc['foo']) # Series
CodePudding user response:
the []
syntax calls the __getitem__
method of python so you could reproduce this behaviour like this:
class FakeLocObject:
def __init__(self, caller_id):
self.id = caller_id
def __getitem__(self, key):
return f"hello {key} of {self.id}"
class FakeDFObject:
def __init__(self, my_id):
self.id = my_id
@property
def loc(self):
return FakeLocObject(self.id)
them
foo = FakeDFObject(42)
foo.loc["world"]
gives 'hello world of 42'
you can look into more details in the python data model but the general idea is that a lot of the python syntax (.
, ()
, []
, ==
...) is actually syntax sugar that will call methods that start with __
which gives you the ability to make custom object where the "logical" usage is a non-trivial implementation