Converting a formatted string into a pandas dataframe?-CodePudding

I have a long string in an unusual but consistent format that I want to convert to a pandas data frame. Below is an example of the format that repeats:

' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '

The desired data frame would be this:

Col A    Col B    Col C
"Val"    10       1
"Val"    4        0

I have tried splitting the string by brackets as a delimiter, but I have not been able to convert each split into a row due to varying datatypes.

Is there an easier way of doing this?

CodePudding user response：

The easiest way that I can imagine.

import json
import pandas as pd

txt = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '

# simulate a list of dicts and parse it like a json file
data = json.loads(f'[{txt}]')

df = pd.DataFrame(data)

CodePudding user response：

eval() also works for this case:

pd.DataFrame(eval(txt))

  Col A  Col B  Col C
0   Val     10      1
1   Val      4      0

CodePudding user response：

You can evaluate the string as dictionary or interpret it as JSON.

To evaluate don't use eval (dangerous) but ast.literal_eval:

s = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '

from ast import literal_eval
import pandas as pd

df = pd.DataFrame(literal_eval(s.strip()))

For JSON, use pandas.read_json directly:

s = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '

import pandas as pd

df = pd.read_json(f'[{s}]')

Output:

  Col A  Col B  Col C
0   Val     10      1
1   Val      4      0