I am looking to generate a list of tuples from my Dataframes. Here is my dataframe
data.csv
,Date,Open,High,Low,Close,min,max
2022-10-03 12:00:00 01:00,19268.458333333332,141.95199584960938,141.97999572753906,141.30999755859375,141.42999267578125,141.42999267578125,
2022-10-04 16:00:00 01:00,19269.625,143.83799743652344,144.07699584960938,143.72999572753906,143.99000549316406,,143.99000549316406
2022-10-05 15:00:00 01:00,19270.583333333332,142.83299255371094,142.87100219726562,142.4199981689453,142.66000366210938,142.66000366210938,
2022-10-06 06:00:00 01:00,19271.208333333332,143.36000061035156,143.43600463867188,143.24000549316406,143.4010009765625,,143.4010009765625
2022-10-07 13:00:00 01:00,19272.5,141.85899353027344,142.1219940185547,141.17999267578125,141.45599365234375,141.45599365234375,
I want to extract ('Date', 'Close')
of each row
like this ('2022-10-03', 141.42999267578125)
and create a tuples list from those tuples.
I manually created the list of tuples to show what exactly I am looking for
tuples_list = [
('2022-10-03', 141.42999267578125), ('2022-10-04', 143.99000549316406), # row[0-1]
('2022-10-04', 143.99000549316406), ('2022-10-05', 142.66000366210938), # row[1-2]
('2022-10-05', 142.66000366210938), ('2022-10-06', 143.4010009765625), # row[2-3]
('2022-10-06', 143.4010009765625), ('2022-10-07', 141.45599365234375), # row[3-4]
]
CodePudding user response:
One approach could be as follows:
df.index = pd.to_datetime(df.index).date.astype(str)
s = pd.concat([df.Close]*2).sort_index()
tuples_list = list(zip(s.index, s))[1:-1]
print(tuples_list)
[('2022-10-03', 141.42999267578125),('2022-10-04', 143.99000549316406),
('2022-10-04', 143.99000549316406),('2022-10-05', 142.66000366210938),
('2022-10-05', 142.66000366210938),('2022-10-06', 143.4010009765625),
('2022-10-06', 143.4010009765625),('2022-10-07', 141.45599365234375)]
CodePudding user response:
The line below gives the desired list of tuples assuming that df is your pandas dataframe:
list_tuples = list(df[['Date', 'Close']].to_records(index=True))
Edit: Edited answer so that the result is exactly the tuples you want.
CodePudding user response:
With such simple data, and a non-pandas desired output, using pandas may be overkill.
import csv
with open('data.csv') as f:
file = csv.reader(f)
header = next(file)
tuples_list = [(x[0][:10], float(x[5])) for x in file]
print(tuples_list)
Output:
[('2022-10-03', 141.42999267578125),
('2022-10-04', 143.99000549316406),
('2022-10-05', 142.66000366210938),
('2022-10-06', 143.4010009765625),
('2022-10-07', 141.45599365234375)]
from itertools import pairwise, chain
tuples_list = list(chain.from_iterable(pairwise(tuples_list)))
print(tuples_list)
Output:
[('2022-10-03', 141.42999267578125), ('2022-10-04', 143.99000549316406),
('2022-10-04', 143.99000549316406), ('2022-10-05', 142.66000366210938),
('2022-10-05', 142.66000366210938), ('2022-10-06', 143.4010009765625),
('2022-10-06', 143.4010009765625), ('2022-10-07', 141.45599365234375)]