I have the following data frame object
total
scanned_date
2021-11-01 0
2021-11-02 0
2021-11-03 0
2021-11-04 0
2021-11-05 0
Where scanned_date
is Timestamp
object.
I want to convert the data to a list of tuples like
[
(2021-11-01, 0),
(2021-11-02, 0),
(2021-11-03, 0),
...
]
But when using
list(df.to_records())
It is adding timezone, while I only want the date string
[('2021-11-01T00:00:00.000000000', 0), ('2021-11-02T00:00:00.000000000', 0), ('2021-11-03T00:00:00.000000000', 0)]
How can I remove the timezone string T00:00:00.00000000
from the to_records()
output?
CodePudding user response:
Try convert strftime
df.index = df.index.strftime('%Y-%m-%d')
list(df.to_records())
Out[212]:
[('2021-11-01', 0),
('2021-11-02', 0),
('2021-11-03', 0),
('2021-11-04', 0),
('2021-11-05', 0)]
CodePudding user response:
I tried to do the date conversion in numpy but chose to switch to pandas. In numpy your working with a 64 bit integer. I used a map function and a lambda to convert the dataframe record into a date and value tuple
txt="""scanned_date,total
2021-11-01,0
2021-11-02,0
2021-11-03,0
2021-11-04,0
2021-11-05,0
"""
#https://www.py4u.net/discuss/17020
df = pd.read_csv(io.StringIO(txt),sep=',',parse_dates=['scanned_date'])
print(list(map(lambda tuple_obj:
(
pd.to_datetime(tuple_obj[1],'%M/%d/%Y')
#str(tuple_obj[1].astype("datetime64[M]").astype(int)% 12 1)
# "-" str(tuple_obj[1].astype(object).day)
# "-" str(tuple_obj[1].astype("datetime64[Y]"))
,
tuple_obj[2]),
df.to_records())))
output:
[(Timestamp('2021-11-01 00:00:00'), 0), (Timestamp('2021-11-02 00:00:00'), 0), (Timestamp('2021-11-03 00:00:00'), 0), (Timestamp('2021-11-04 00:00:00'), 0), (Timestamp('2021-11-05 00:00:00'), 0)]