Home > OS >  Pandas data frame to specific json format
Pandas data frame to specific json format

Time:10-15

Example of data frame I have is as follows:

Period | ticker | rank | compositesore | C20 | C11 | C25 |
----------------------------------------------------------
2010-1 | AAPL   | 2    |  100          | 121 | 44  |56   |
-----------------------------------------------------------

The json output I want is as follows:

    [
     {
        "period": "2010-1",
        "ticker": "AAPL",
        "rank": 2,
        "compositescore": 100,
        "factorValues": {
          "C20": 121,
          "C11": 44,
          "C25": 56
          
        }
     ]

I want is collect all C20, C11, and C25 columns under the "factorValues".

CodePudding user response:

Not terribly pretty, but it works. For the example, you'll need to import json and io, as well as pandas, of course.

data = '''Period|ticker|rank|compositesore|C20|C11|C25
2010-1|AAPL|2|100|121|6|156
2010-3|NFLX|3|100|4121|464|596
2010-4|QQQQ|4|100|1281|441|856
'''
df = pd.read_csv(io.StringIO(data), sep='|', engine='python')

cola = ['Period', 'ticker', 'rank', 'compositesore']
colb = ['C20', 'C11',  'C25']

def create_json(x):
    # print(x[cola].to_json())
    dd = json.loads(x[cola].to_json())
    ee = json.loads(x[colb].to_json())
    dd['factorValues'] = ee
    # print(dd)
    return dd

df['jsons'] = df.apply(lambda x: create_json(x), axis=1)
df.jsons[0]

Output

{'Period': '2010-1',
 'ticker': 'AAPL',
 'rank': 2,
 'compositesore': 100,
 'factorValues': {'C20': 121, 'C11': 6, 'C25': 156}}

Finally

df.jsons.tolist()
[{'Period': '2010-1',
  'ticker': 'AAPL',
  'rank': 2,
  'compositesore': 100,
  'factorValues': {'C20': 121, 'C11': 6, 'C25': 156}},
 {'Period': '2010-3',
  'ticker': 'NFLX',
  'rank': 3,
  'compositesore': 100,
  'factorValues': {'C20': 4121, 'C11': 464, 'C25': 596}},
 {'Period': '2010-4',
  'ticker': 'QQQQ',
  'rank': 4,
  'compositesore': 100,
  'factorValues': {'C20': 1281, 'C11': 441, 'C25': 856}}]

Side note: If you do

return json.dumps(dd)

you'll get double quotes instead of single quotes

  • Related