Convert timeseries pandas dataframe to nested JSON-CodePudding

I have the following data frame"

Code to create the dataframe:

df = pd.DataFrame( {'month_i': {0: '2022-01-31', 1: '2022-02-28', 2: '2022-03-31', 3: '2022-04-30', 4: '2022-01-31', 5: '2022-02-28', 6: '2022-03-31', 7: '2022-04-30'}, 'id': {0: 'ACT', 1: 'ACT', 2: 'ACT', 3: 'ACT', 4: 'ACT', 5: 'ACT', 6: 'ACT', 7: 'ACT'}, 'city_name': {0: 'New York', 1: 'New York', 2: 'New York', 3: 'New York', 4: 'New York', 5: 'New York', 6: 'New York', 7: 'New York'}, 'lineColor': {0: '#4F63E7', 1: '#4F63E7', 2: '#4F63E7', 3: '#4F63E7', 4: '#4F63E7', 5: '#4F63E7', 6: '#4F63E7', 7: '#4F63E7'}, 'ptype': {0: 'house', 1: 'house', 2: 'house', 3: 'house', 4: 'unit', 5: 'unit', 6: 'unit', 7: 'unit'}, 'som': {0: 171.0, 1: 483.0, 2: 478.0, 3: 465.0, 4: 107.0, 5: 250.0, 6: 268.0, 7: 248.0}, 'dom': {0: 25, 1: 30, 2: 24, 3: 24, 4: 53, 5: 51, 6: 48, 7: 37}} )
df['month_i'] = pd.to_datetime(df['month_i'])

which needs to be presented as a JSON in this format

{
  "house": {
    "som": [
      171,
      483,
      478,
      465
    ],
    "dom": [
      25,
      30,
      24,
      24
    ]
  },
  "unit": {
    "som": [
      107,
      250,
      268,
      248
    ],
    "dom": [
      53,
      51,
      48,
      37
    ]
  },
  "ref": {
    "months": [
      "2022-01-31",
      "2022-02-28",
      "2022-03-31",
      "2022-04-30"
    ],
    "lineColor": "#4F63E7",
    "city_name": "New York"
  }
}

I was hoping to get to a solution where the json can be produced in pandas with dict or a nested loop. But can only get as far as grouping by ptype. Any help would be greatly appreciated!

P.S> lineColor can be hardcoded to "#4F63E7" and does not need to be extracted from the dataframe

CodePudding user response：

You can try creating two different dictionaries: one with the main data and the other with the reference data and then merging them:

import json 

#pivot the dataframe with the main data
pivoted = df.pivot("month_i","ptype",["som","dom"]).T

#groupby and create the first dictionary in the necessary format
data = pivoted.groupby(level=1).apply(lambda x: x.droplevel(1).apply(list,axis=1).to_dict()).to_dict()

#create the second dictionary with the reference data
other = {"ref": {"months": list(pivoted.columns), 
                 "lineColor": df["lineColor"].iat[0], 
                 "city_name":df["city_name"].iat[0]}}

#merge the dictionaries
merged = data|other

#convert to json
output = json.dumps(merged)

>>> output
{
  "house": {
    "som": [
      171,
      483,
      478,
      465
    ],
    "dom": [
      25,
      30,
      24,
      24
    ]
  },
  "unit": {
    "som": [
      107,
      250,
      268,
      248
    ],
    "dom": [
      53,
      51,
      48,
      37
    ]
  },
  "ref": {
    "months": [
      "2022-01-31",
      "2022-02-28",
      "2022-03-31",
      "2022-04-30"
    ],
    "lineColor": "#4F63E7",
    "city_name": "New York"
  }
}

CodePudding user response：

Another approch:

_dict = {}
m_key = []
s_key = ['som', 'dom']

for value in df['ptype'].unique():
    frame = df[df['ptype'] == value]
    m_key.append(value)

for m in m_key:
    _dict[m] = {}
    for s in s_key:
        _dict[m][s] = []
        for ptype in df['ptype'].unique():
            frame = df[df['ptype'] == ptype]
            _dict[m][s].append(frame['som'].to_list())
            _dict[m][s].append(frame['dom'].to_list())
            break

_dict['ref'] = {}
_dict['ref']['months'] = df['month_i'].unique().tolist()
_dict['ref']['lineColor'] = '#4F63E7'

print(json.dumps(_dict))