emphasized textI have created a number of dataframes, all with varied overlapping datetime indexes. I then append them to each other, and append a second index, the "name" column.
i try to get a slice of the dataframe by the entry in name, but it says it is not lexsorted. When i try to lexsort, it throws an error on the string in the name column, saying it is not a date...
df0 = pd.read_csv("file.csv", index_col='date', parse_dates=True)
df1 = pd.read_csv("file2.csv", index_col='date', parse_dates=True)
stack = pd.concat([df0, df1] ,axis=0)
stack.set_index('name', append=True,inplace=True)
print(stack.loc[datetime(2015,5,1)])
print(stack.loc['AAA'])
The .loc on date works and returns all names and values for the given date, but the .loc on the string, in the second index returns
UnsortedIndexError: 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [0], lexsort depth 0'
When I try to sort, based on what ive read here, using stack.sort_index(inplace=True)
I get the error ValueError: Unknown datetime string format, unable to parse: AAA
the dataframe stack looks like this:
date name value
1/1/15 AAA 1
... ... ...
12/1/15 AAA 9
1/1/15 BBB 6
... ... ...
12/1/15 BBB 3
and i would like to be able to return all dates and values for AAA. I understand i can just keep it as a single index, and do groupby or filter, but I thought this was the purpose of the multiindex?
Edited again to clarify - I would like to know how to lexsort the second index. I am not sure why pandas is expecting it to also be a datetimeindex, which I think is what it is implying.
edited to add stack.head(), but its as i represented above.
value
date name
2015-01-02 AAA 164.71
2015-01-05 AAA 163.00
2015-01-06 AAA 160.82
2015-01-07 AAA 159.90
2015-01-08 AAA 160.65
Edit3: When i sort the level name using stack.sort_index(level = [1], inplace=True,axis=1)
I get:
pandas.errors.UnsortedIndexError: 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [0], lexsort depth 0
CodePudding user response:
I dont like this solution but after lots of tinkering I arrived at :
stack.loc[(slice(None),'AAA'),:]
If there is another way than this or mozway's solution please reply.
CodePudding user response:
It looks like you're looking for a cross-section (xs
):
df.xs('AAA', level='name', drop_level=False)
example:
df = (pd.DataFrame({'value': {(Timestamp('2015-01-02 00:00:00'), 'AAA'): 164.71,
(Timestamp('2015-01-05 00:00:00'), 'AAA'): 163.0,
(Timestamp('2015-01-06 00:00:00'), 'AAA'): 160.82,
(Timestamp('2015-01-02 00:00:00'), 'BBB'): 159.9,
(Timestamp('2015-01-05 00:00:00'), 'BBB'): 160.65}})
.rename_axis(['date', 'name'])
.sort_index(level=1)
)
df.xs('AAA', level='name', drop_level=False)
value
date name
2015-01-02 AAA 164.71
2015-01-05 AAA 163.00
2015-01-06 AAA 160.82
Alternatively, using IndexSlice
:
idx = pd.IndexSlice
df.loc[idx[:, 'AAA'], :]