Override dict square [] operator to perform equality operations-CodePudding

given data and my own DataFrame class which takes the dict as a parameter like this.

frame = {
    "a": ["X4E", "T3B", "F8D", "C7X"],
    "b": [7.0, 3.5, 8.0, 6.0],
    "c": [5, 3, 1, 10],
    "d": [False, False, True, False]
}

df = DataFrame(frame)

How would one override the __getitem__ method for dicts to allow actions such as

res = df[(df["b"]   5.0 > 10.0)]["a"]

which would return all the cases where b 5.0 is greater than 10.0. Like a list/series of booleans. This will eventually extend to something like this

res = df[(df["b"]   5.0 > 10.0) & (df["c"] > 3) & ~df["d"]]["a"]

I am not sure how to start with this. I learnt about the __getitem__ but have no idea how to use this to add a value to values in a dict and perform element wise math ops. This is similar to pandas data frames but not sure how to implement this myself

CodePudding user response：

Not exactly sure if your question is about dictionaries or dataframes... But for dicts I would go about it like this:

get the value of "b" which is a list in your example, iterate through list and if value 5 is greater than 10 return true, else return false.

This code should work:

for x in range(len(frame["b"])):
    if frame["b"][x]   5 > 10:
        print("True")
    else:
        print("False")

Apologies if I have misunderstood what you were asking for.

CodePudding user response：

This is a super simplified, but you'll need to define __getitem__ on your DataFrame class and also have some type of Column (similar to pd.Series in pandas).

Right now this only implements the __add__, but to get where you want to go, you'll need to implement the other comparison operators (and check the types of your Columns) as well as some indexing mechanism. Hope this helps a bit!

class Column:
    def __init__(self, l):
        self.l = l

    def __str__(self):
        return f"<Column {self.l}>"

    def __add__(self, o):
        return Column([n   o for n in self.l])


class DataFrame:
    def __init__(self, data):
        self.data = {}

        for key, val in data.items():
            if isinstance(val, list):
                self.data[key] = Column(val)

    def __str__(self):
        s = "<DataFrame\n"

        for key, val in self.data.items():
            s  = f"\t{key}: {val}\n"

        s  = ">"

        return s

    def __getitem__(self, key):
        return self.data[key]


frame = {
    "a": ["X4E", "T3B", "F8D", "C7X"],
    "b": [7.0, 3.5, 8.0, 6.0],
    "c": [5, 3, 1, 10],
    "d": [False, False, True, False],
}


df = DataFrame(frame)

print(df)
print(df["b"])
print(df["b"]   5.0)

Output

<DataFrame
    a: <Column ['X4E', 'T3B', 'F8D', 'C7X']>
    b: <Column [7.0, 3.5, 8.0, 6.0]>
    c: <Column [5, 3, 1, 10]>
    d: <Column [False, False, True, False]>
>
<Column [7.0, 3.5, 8.0, 6.0]>
<Column [12.0, 8.5, 13.0, 11.0]>