Home > Net >  Python: Extracting Datasets in Dataset
Python: Extracting Datasets in Dataset

Time:03-08

I got a weird looking dataset, where every row describes another dataset. "data" in this case is a list which I have converted to a dataframe.

result_df = pd.DataFrame(data)

enter image description here

When looking in the first entry of the dataframe above, I see a dataframe with 5 rows. This is the case for every other row. See the dataframe for the first row (row zero) here:

result_df[0][0]
    _embedded.results|className _embedded.results|classId   _embedded.results|uri   _embedded.results|searchHit _embedded.results|title _embedded.results|preferredLabel    _embedded.results|isTopConceptInScheme  _embedded.results|isInScheme    _embedded.results|hasSkillType  _embedded.results|hasReuseLevel _embedded.results|broaderHierarchyConcept   _embedded.results|_links    _embedded.results|broaderSkill  BC_name
   0    Skill   http://data.europa.eu/esco/model#Skill  http://data.europa.eu/esco/skill/237db40b-4600...   range of project control principles project management principles   {'de': 'Prinzipien des Projektmanagements', 'n...   [http://data.europa.eu/esco/concept-scheme/mem...   [http://data.europa.eu/esco/concept-scheme/ski...   [http://data.europa.eu/esco/skill-type/knowledge]   [http://data.europa.eu/esco/skill-reuse-level/...   [http://data.europa.eu/esco/isced-f/0413]   {'self': {'href': 'https://ec.europa.eu/esco/a...   NaN Project Financials Control
   1    Skill   http://data.europa.eu/esco/model#Skill  http://data.europa.eu/esco/skill/abb9c7f1-6d69...   Operate projection equipment manually or with ...   operate projector   {'de': 'Projektoren bedienen', 'no': 'betjene ...   [http://data.europa.eu/esco/concept-scheme/mem...   [http://data.europa.eu/esco/concept-scheme/ski...   [http://data.europa.eu/esco/skill-type/skill]   [http://data.europa.eu/esco/skill-reuse-level/...   [http://data.europa.eu/esco/skill/S8.6.2]   {'self': {'href': 'https://ec.europa.eu/esco/a...   NaN Project Financials Control
   2    Skill   http://data.europa.eu/esco/model#Skill  http://data.europa.eu/esco/skill/25a713ba-cbc0...   Manage the overall planning, coordination, and...   manage railway construction projects    {'de': 'Bahnbauprojekte leiten', 'no': 'admini...   NaN [http://data.europa.eu/esco/concept-scheme/ski...   [http://data.europa.eu/esco/skill-type/skill]   [http://data.europa.eu/esco/skill-reuse-level/...   [http://data.europa.eu/esco/skill/S4.2.1]   {'self': {'href': 'https://ec.europa.eu/esco/a...   [http://data.europa.eu/esco/skill/fff5bc45-b50...   Project Financials Control
   3    Skill   http://data.europa.eu/esco/model#Skill  http://data.europa.eu/esco/skill/d37bc902-f640...   prepare financial projections   prepare financial projections   {'de': 'Finanzprognosen erstellen', 'no': 'for...   [http://data.europa.eu/esco/concept-scheme/mem...   [http://data.europa.eu/esco/concept-scheme/ski...   [http://data.europa.eu/esco/skill-type/skill]   [http://data.europa.eu/esco/skill-reuse-level/...   [http://data.europa.eu/esco/skill/S2.7.3]   {'self': {'href': 'https://ec.europa.eu/esco/a...   NaN Project Financials Control
   4    Skill   http://data.europa.eu/esco/model#Skill  http://data.europa.eu/esco/skill/7106b5df-e017...   PRojects IN Controlled Environments, version 2  Prince2 project management  {'de': 'Prince2-Projektmanagement', 'no': 'Pri...   NaN [http://data.europa.eu/esco/concept-scheme/ski...   [http://data.europa.eu/esco/skill-type/knowledge]   [http://data.europa.eu/esco/skill-reuse-level/...   [http://data.europa.eu/esco/isced-f/0413]   {'self': {'href': 'https://ec.europa.eu/esco/a...   [http://data.europa.eu/esco/skill/bec4359e-cb9...   Project Financials Control

Here's a screenshot snipped of the dataframe: enter image description here

Is it possible to extract these dataset in every row and append it to one big dataframe? So the resulting dataframe at the end should have the size of "1716 x 5 = 8580".

I tried something like this without success:

column_names = ["_embedded.results|className", "_embedded.results|classId", "_embedded.results|uri","_embedded.results|searchHit", "_embedded.results|title ", "_embedded.results|preferredLabel", "_embedded.results|isTopConceptInScheme", "embedded.results|isInScheme","_embedded.results|hasSkillType","_embedded.results|hasReuseLevel","_embedded.results|broaderHierarchyConcept","_embedded.results|_links","_embedded.results|broaderSkill","BC_name"]
my_df = pd.DataFrame(columns = column_names)

for index, i in result_df.iterrows():
  for j in i:
    my_df.append(j)

CodePudding user response:

IIUC use if need convert each value to dataFrame:

result_df = pd.concat([pd.DataFrame(x) for x in data], ignore_index=True)

Or if there is already list of DataFrames:

result_df = pd.concat(data, ignore_index=True)
  • Related