I have a long string in an unusual but consistent format that I want to convert to a pandas data frame. Below is an example of the format that repeats:
' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '
The desired data frame would be this:
Col A Col B Col C
"Val" 10 1
"Val" 4 0
I have tried splitting the string by brackets as a delimiter, but I have not been able to convert each split into a row due to varying datatypes.
Is there an easier way of doing this?
CodePudding user response:
The easiest way that I can imagine.
import json
import pandas as pd
txt = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '
# simulate a list of dicts and parse it like a json file
data = json.loads(f'[{txt}]')
df = pd.DataFrame(data)
CodePudding user response:
eval()
also works for this case:
pd.DataFrame(eval(txt))
Col A Col B Col C
0 Val 10 1
1 Val 4 0
CodePudding user response:
You can evaluate the string as dictionary or interpret it as JSON.
To evaluate don't use eval
(dangerous) but ast.literal_eval
:
s = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '
from ast import literal_eval
import pandas as pd
df = pd.DataFrame(literal_eval(s.strip()))
For JSON, use pandas.read_json
directly:
s = ' {"Col A":"Val","Col B":10,"Col C":1},{"Col A":"Val","Col B":4,"Col C":0} '
import pandas as pd
df = pd.read_json(f'[{s}]')
Output:
Col A Col B Col C
0 Val 10 1
1 Val 4 0