I have a Pandas
dataframe
that looks something like this:
a b c ... x y z
date ...
2043-10-01 10230.413086 846.184082 0.267180 ... 2771.997314 20.699804 4000.0
2043-11-01 10229.154297 841.288513 0.267003 ... 2770.365723 20.749172 4000.0
2043-12-01 10231.440430 836.821472 0.266981 ... 2769.230469 20.797396 4000.0
2044-01-01 10237.501953 832.406677 0.267381 ... 2768.310547 20.849573 4000.0
2044-02-01 10233.545898 827.571655 0.266966 ... 2766.528564 20.897126 4000.0
2044-03-01 10235.044922 823.357910 0.266938 ... 2765.628906 20.942534 4000.0
2044-04-01 10243.462891 819.170654 0.267569 ... 2765.451172 20.993223 4000.0
2044-05-01 10236.799805 814.516602 0.266984 ... 2763.450684 21.038358 4000.0
2044-06-01 10240.304688 810.241150 0.266869 ... 2762.673828 21.087164 4000.0
2044-07-01 10259.951172 806.501587 0.267803 ... 2764.588135 21.142576 4000.0
I want to extract the values at dates defined using a Pandas
date_range
eg:
import pandas as pd
for xdat in pd.date_range(start="2040/01/01", end="2044/07/01", freq="MS"):
x = df[xdat]['x']
However, I get this error KeyError: Timestamp('2040-01-01 00:00:00')
. I have tried converting the Timestamp
variable xdat
using pd.to_datetime
(and variations of this) but so far without success. I'm sure the answer is trivial but I can't see it so would appreciate any suggestions. Thanks in advance!
CodePudding user response:
Convert the "date" column to to_datetime and access fields using loc:
df = pd.DataFrame(data=[["2043-10-01",10230.413086,846.184082,0.267180],["2043-11-01",10229.154297,841.288513,0.267003],["2043-12-01",10231.440430,836.821472,0.266981],["2044-01-01",10237.501953,832.406677,0.267381],["2044-02-01",10233.545898,827.571655,0.266966],["2044-03-01",10235.044922,823.357910,0.266938],["2044-04-01",10243.462891,819.170654,0.267569],["2044-05-01",10236.799805,814.516602,0.266984],["2044-06-01",10240.304688,810.241150,0.266869],["2044-07-01",10259.951172,806.501587,0.267803]], columns=["date","a","b","c"])
df["date"] = df["date"].apply(pd.to_datetime)
df = df.set_index("date")
for xdat in pd.date_range(start="2044/01/01", end="2044/07/01", freq="MS"):
df.loc[xdat, "a"]
# Or filter by date index
df = df.loc[pd.date_range(start="2044/01/01", end="2044/07/01", freq="MS"), "a"]
a
date
2044-01-01 10237.501953
2044-02-01 10233.545898
2044-03-01 10235.044922
2044-04-01 10243.462891
2044-05-01 10236.799805
2044-06-01 10240.304688
2044-07-01 10259.951172