Let's have such a structure:
from datetime import date, timedelta
from pandas import DataFrame as df
import numpy as np
idx1 = []
idx2 = []
idx3 = []
for i in range(3):
idx1.append(date.today() - timedelta(days=0))
idx2.append(date.today() - timedelta(days=1))
idx3.append(date.today() - timedelta(days=2))
data1 = {"Sasquach": np.random.uniform(1, 10, 2), "Furby":np.random.uniform(1, 10, 2), "Ant":np.random.uniform(1, 10, 2)}
data2 = {"Sasquach": np.random.uniform(1, 10, 2), "Furby":np.random.uniform(1, 10, 2), "Ant":np.random.uniform(1, 10, 2)}
data3 = {"Sasquach": np.random.uniform(1, 10, 2), "Furby":np.random.uniform(1, 10, 2), "Ant":np.random.uniform(1, 10, 2)}
my_dataframe_1 = df.from_dict(data1, orient="index", columns=["pretty", "brave"])
my_dataframe_2 = df.from_dict(data2, orient="index", columns=["pretty", "brave"])
my_dataframe_3 = df.from_dict(data3, orient="index", columns=["pretty", "brave"])
my_dataframe_1["timestamp"] = idx1
my_dataframe_2["timestamp"] = idx2
my_dataframe_3["timestamp"] = idx3
ultimate_df = my_dataframe_1.append(my_dataframe_2.append(my_dataframe_3))
ultimate_df.sort_values(["timestamp", "pretty"], ascending=[True, False], inplace=True)
ultimate_df.reset_index(inplace=True)
ultimate_df.set_index(["timestamp", "timestamp"], inplace=True)
print(ultimate_df)
This gives us:
index pretty brave
timestamp timestamp
2022-02-28 2022-02-28 Furby 6.083493 8.383633
2022-02-28 Sasquach 3.454873 6.426673
2022-02-28 Ant 1.279582 9.647796
2022-03-01 2022-03-01 Furby 9.667125 3.462951
2022-03-01 Ant 3.443364 5.457242
2022-03-01 Sasquach 3.364245 5.190403
2022-03-02 2022-03-02 Ant 2.773309 4.708483
2022-03-02 Furby 2.765552 2.065672
2022-03-02 Sasquach 2.347767 7.956183
My question is, is there any easy way for manipulating such structured data, where the index repeats for every item in "index" column? My goal is to easily select item from "index" column with the highest pretty value for the most recent date (I was thinking it would be possible with ultimate_df.iloc[-1].iloc[0] but it's not the case. The multi-index is not necessary here, it's the approach I tried using, but when I slice it like this:
print(ultimate_df.iloc[-1])
The result is as below, so it's the row of "inner" index.
index Sasquach
pretty 2.347767
brave 7.956183
Name: (2022-03-02, 2022-03-02), dtype: object
Do you have any advise how to approach it?. Maybe it's possible to have only that outer-index (that kinda groups, doesn't repeat for all the items) that when doing .iloc[-1] would print the most recent timestamp with those 3 items? I really dislike df.groupby() and I don't want to create separate dataframes for every item in "index" column. My only idea is using such multi-index and slicing like that:
print(ultimate_df.loc[ultimate_df.iloc[-1].name]) (this print just for your convinience)
print(ultimate_df.loc[ultimate_df.iloc[-1].name].iloc[0])
To receive the item with highest "pretty" value of the most recent timestamp/date but it looks very complicated.
Result:
index pretty brave
timestamp timestamp
2022-03-02 2022-03-02 Ant 2.773309 4.708483
2022-03-02 Furby 2.765552 2.065672
2022-03-02 Sasquach 2.347767 7.956183
index Ant
pretty 2.773309
brave 4.708483
Name: (2022-03-02, 2022-03-02), dtype: object
EDIT: For future generations, some alternative is creating a dict of dataframes to help manipulate such data with repeating items:
data = pd.DataFrame({'Names': ['Joe', 'John', 'Jasper', 'Jez'] *4, 'Ob1' : np.random.rand(16), 'Ob2' : np.random.rand(16)})
#create unique list of names
UniqueNames = data.Names.unique()
#create a data frame dictionary to store your data frames
DataFrameDict = {elem : pd.DataFrame for elem in UniqueNames}
for key in DataFrameDict.keys():
DataFrameDict[key] = data[:][data.Names == key]
CodePudding user response:
Trying resetting the MultiIndex and filter to get the required row:
df = ultimate_df.droplevel(0).reset_index()
>>> df.loc[df[df["timestamp"].eq(df["timestamp"].max())]["pretty"].idxmax()]
timestamp 2022-03-02
index Furby
pretty 8.162101
brave 1.038208
Name: 6, dtype: object