Home > Net >  How does the pandas library implements its strange Python syntax?
How does the pandas library implements its strange Python syntax?

Time:12-01

I'm using Python for some projects, but I still have no idea how a library like pandas can redefine the core syntax of the Python language.

This code generates no error, but whatever I try to implement (outside of pandas library) nothing will be accepted and executed by the runtime, unlike this pandas code:

df[['col_x', 'col_y']]
# or
df.loc['col_x']

I could try to implement similar syntax, e.g.: I can use a function to pass or return a data structure similarly like this:

df([['col_x', 'col_y']])
# or
df = [['col_x', 'col_y']]
# or
df.loc()['col_x']
# or
df.loc = ['col_x']

but it's not close to the syntax of the pandas library. But if I leave out the parentheses () it will generate a syntax error.

  • How does the pandas library implement this custom syntax?
  • Could I define any custom syntax for a new library, e.g. df[[ ]]?
  • Can someone explain this with simple Python concepts?

CodePudding user response:

The following is an example class which implements __getitem__ and a loc property:

class Test:
    def __getitem__(self, index):
        if isinstance(index, list):
            return 'DataFrame'
        else:
            return 'Series'

    @property
    def loc(self):
        return self
    

test = Test()
print(test['foo'])  # Series
print(test[['foo']])  # DataFrame
print(test.loc['foo'])  # Series

CodePudding user response:

the [] syntax calls the __getitem__ method of python so you could reproduce this behaviour like this:

class FakeLocObject:
    
    def __init__(self, caller_id):
        self.id = caller_id
    
    def __getitem__(self, key):
        return f"hello {key} of {self.id}"

    
class FakeDFObject:
    
    def __init__(self, my_id):
        self.id = my_id
        
    @property
    def loc(self):
        return FakeLocObject(self.id)

them

foo = FakeDFObject(42)
foo.loc["world"]

gives 'hello world of 42'

you can look into more details in the python data model but the general idea is that a lot of the python syntax (., (), [], == ...) is actually syntax sugar that will call methods that start with __ which gives you the ability to make custom object where the "logical" usage is a non-trivial implementation

  • Related