I have the data in CSV format, for example:
- First row is column number, let's ignore that.
- Second row, starting at Col_4, are number of days
- Third row on: Col_1 and Col_2 are coordinates (lon, lat), Col_3 is a statistical value, Col_4 onwards are measurements.
As you can see, this format is a confusing mess. I would like to convert this to JSON the following way, for example:
{"points":{
"dates": ["20190103", "20190205"],
"0":{
"lon": "-8.072557",
"lat": "41.13702",
"measurements": ["-0.191792","-10.543130"],
"val": "-1"
},
"1":{
"lon": "-8.075557",
"lat": "41.15702",
"measurements": ["-1.191792","-2.543130"],
"val": "-9"
}
}
}
To summarise what I've done till now, I read the CSV to a Pandas DataFrame:
df = pandas.read_csv("sample.csv")
I can extract the dates into a Numpy Array with:
dates = df.iloc[0][3:].to_numpy()
I can extract measurements for all points with:
measurements_all = df.iloc[1:,3:].to_numpy()
And the lon and lat and val, respectively, with:
lon_all = df.iloc[1:,0:1].to_numpy()
lat_all = df.iloc[1:,1:2].to_numpy()
val_all = df.iloc[1:,2:3].to_numpy()
Can anyone explain how I can format this info a structure identical to the .json example?
CodePudding user response:
With this dataframe
df = pd.DataFrame([{"col1": 1, "col2": 2, "col3": 3, "col4":3, "col5":5},
{"col1": None, "col2": None, "col3": None, "col4":20190103, "col5":20190205},
{"col1": -8.072557, "col2": 41.13702, "col3": -1, "col4":-0.191792, "col5":-10.543130},
{"col1": -8.072557, "col2": 41.15702, "col3": -9, "col4":-0.191792, "col5":-2.543130}])
This code does what you want, although it is not converting anything to strings as in you example. But you should be able to easily do it, if necessary.
# generate dict with dates
final_dict = {"dates": [list(df["col4"])[1],list(df["col5"])[1]]}
# iterate over relevant rows and generate dicts
for i in range(2,len(df)):
final_dict[i-2] = {"lon": df["col1"][i],
"lat": df["col2"][i],
"measurements": [df[cname][i] for cname in ["col4", "col5"]],
"val": df["col3"][i]
}
this leads to this output:
{0: {'lat': 41.13702,
'lon': -8.072557,
'measurements': [-0.191792, -10.54313],
'val': -1.0},
1: {'lat': 41.15702,
'lon': -8.072557,
'measurements': [-0.191792, -2.54313],
'val': -9.0},
'dates': [20190103.0, 20190205.0]}
CodePudding user response:
Extracting dates from data and then eliminating first row from data frame:
dates =list(data.iloc[0][3:])
data=data.iloc[1:]
Inserting dates into dict:
points={"dates":dates}
Iterating through data frame and adding elements to the dictionary:
i=0
for index, row in data.iterrows():
element= {"lon":row["Col_1"],
"lat":row["Col_2"],
"measurements": [row["Col_3"], row["Col_4"]]}
points[str(i)]=element
i =1
You can convert dict to string object using json.dumps():
points_json = json.dumps(points)
It will be string object, not json(dict) object. More about that in this post Converting dictionary to JSON