given data and my own DataFrame class which takes the dict as a parameter like this.
frame = {
"a": ["X4E", "T3B", "F8D", "C7X"],
"b": [7.0, 3.5, 8.0, 6.0],
"c": [5, 3, 1, 10],
"d": [False, False, True, False]
}
df = DataFrame(frame)
How would one override the __getitem__
method for dicts to allow actions such as
res = df[(df["b"] 5.0 > 10.0)]["a"]
which would return all the cases where b 5.0 is greater than 10.0. Like a list/series of booleans. This will eventually extend to something like this
res = df[(df["b"] 5.0 > 10.0) & (df["c"] > 3) & ~df["d"]]["a"]
I am not sure how to start with this. I learnt about the __getitem__
but have no idea how to use this to add a value to values in a dict and perform element wise math ops. This is similar to pandas data frames but not sure how to implement this myself
CodePudding user response:
Not exactly sure if your question is about dictionaries or dataframes... But for dicts I would go about it like this:
get the value of "b" which is a list in your example, iterate through list and if value 5 is greater than 10 return true, else return false.
This code should work:
for x in range(len(frame["b"])):
if frame["b"][x] 5 > 10:
print("True")
else:
print("False")
Apologies if I have misunderstood what you were asking for.
CodePudding user response:
This is a super simplified, but you'll need to define __getitem__
on your DataFrame
class and also have some type of Column
(similar to pd.Series
in pandas).
Right now this only implements the __add__
, but to get where you want to go, you'll need to implement the other comparison operators (and check the types of your Column
s) as well as some indexing mechanism. Hope this helps a bit!
class Column:
def __init__(self, l):
self.l = l
def __str__(self):
return f"<Column {self.l}>"
def __add__(self, o):
return Column([n o for n in self.l])
class DataFrame:
def __init__(self, data):
self.data = {}
for key, val in data.items():
if isinstance(val, list):
self.data[key] = Column(val)
def __str__(self):
s = "<DataFrame\n"
for key, val in self.data.items():
s = f"\t{key}: {val}\n"
s = ">"
return s
def __getitem__(self, key):
return self.data[key]
frame = {
"a": ["X4E", "T3B", "F8D", "C7X"],
"b": [7.0, 3.5, 8.0, 6.0],
"c": [5, 3, 1, 10],
"d": [False, False, True, False],
}
df = DataFrame(frame)
print(df)
print(df["b"])
print(df["b"] 5.0)
Output
<DataFrame
a: <Column ['X4E', 'T3B', 'F8D', 'C7X']>
b: <Column [7.0, 3.5, 8.0, 6.0]>
c: <Column [5, 3, 1, 10]>
d: <Column [False, False, True, False]>
>
<Column [7.0, 3.5, 8.0, 6.0]>
<Column [12.0, 8.5, 13.0, 11.0]>