I have below json
string loaded to dataframe
. Now I want to filter the record based on ossId
.
The condition I have is giving the error message. what is the correct way to filter by ossId?
import pandas as pd
data = """
{
"components": [
{
"ossId": 3946,
"project": "OALX",
"licenses": [
{
"name": "BSD 3",
"status": "APPROVED"
}
]
},
{
"ossId": 3946,
"project": "OALX",
"version": "OALX.client.ALL",
"licenses": [
{
"name": "GNU Lesser General Public License v2.1 or later",
"status": "APPROVED"
}
]
},
{
"ossId": 2550,
"project": "OALX",
"version": "OALX.webservice.ALL" ,
"licenses": [
{
"name": "MIT License",
"status": "APPROVED"
}
]
}
]
}
"""
df = pd.read_json(data)
print(df)
df1 = df[df["components"]["ossId"] == 2550]
CodePudding user response:
I think your issue is due to the json structure. You are actually loading into df
a single row that is the whole list of field component
.
You should instead pass to the dataframe the list of records. Something like:
json_data = json.loads(data)
df = pd.DataFrame(json_data["components"])
filtered_data = df[df["ossId"] == 2550]
CodePudding user response:
You need to go into the cell's data and get the correct key:
df[df['components'].apply(lambda x: x.get('ossId')==2550)]
CodePudding user response:
Use str
df[df.components.str['ossId']==2550]
Out[89]:
components
2 {'ossId': 2550, 'project': 'OALX', 'version': ...