Home > Net >  How to read the below data to a dataframe in python
How to read the below data to a dataframe in python

Time:10-10

I have the data in the below variable ohlc = 1664788799|38444.9|38569.2|38327.85|38412.35|0|0|,1664789099|38408.35|38587.3|38394.05|38586.15|0|0|,1664789399|38589.55|38641.6|38420.05|38422.35|0|0|

The coulums are timestamp, open, high, low, close, volume,OI

How do we convert this pythion variable to dataframe. Any simple way to to do this?

CodePudding user response:

Split the string into lists using the two separators and then pass the list of lists to the dataframe constructor:

pd.DataFrame(
    [row.split('|')[:-1] for row in ohlc.split(',')],
    columns=['timestamp', 'open', 'high', 'low', 'close', 'volume', 'OI']
)

Output:

     timestamp      open       high      low       close    volume  OI
0   1664788799   38444.9    38569.2 38327.85    38412.35         0   0
1   1664789099  38408.35    38587.3 38394.05    38586.15         0   0
2   1664789399  38589.55    38641.6 38420.05    38422.35         0   0

P.S. You can add df = df.apply(pd.to_numeric, errors='coerce') to convert all columns to numeric values (integers/floats).

CodePudding user response:

could you please give a demo of output, in which way you want to (like table)

CodePudding user response:

Since your data is stored in a single string with "|" and "," used as separators, you can use Python's str.split method to get a list of values. In the data there's also an addiitonal sep character at the end which needs to be removed. To do this we can use str.replace using tab characters and str.strip to remove the trailing separator.

For converting the string into a CSV like object Pandas can read, we can use StringIO class from the io builtin module.

import io

ohlc = "1664788799|38444.9|38569.2|38327.85|38412.35|0|0|,1664789099|38408.35|38587.3|38394.05|38586.15|0|0|,1664789399|38589.55|38641.6|38420.05|38422.35|0|0|"

rows = [row.replace('|', '\t').strip() for row in ohlc.split(',')]
rows_tsv = '\n'.join(rows)

header = ['timestamp', 'open', 'high', 'low', 'close', 'volume', 'OI']

df = pd.read_csv(io.StringIO(rows_tsv), sep='\t', header=None))
df.columns = header  # to use the header names as required
  • Related