Home > front end >  Loop Through Two Dataframe Columns to Create Key-Value Pairs For Value Dictionaries Nested Inside A
Loop Through Two Dataframe Columns to Create Key-Value Pairs For Value Dictionaries Nested Inside A

Time:10-29

I want to create a metadata variable based on my input dataset's columns using dictionaries. Specifically, I want the metadata variable to have exactly this structure and values:

metadata = {
    'fields': {
        'id': {'type': 'categorical'},
        'name': {'type': 'categorical'},
        'category_id': {'type': 'categorical'},
        'category_label': {'type': 'categorical'},
        'Compatible operating systems.2507': {'type': 'categorical'},
        'Compatible operating systems.2507.unit': {'type': 'categorical'},
        'Dimensions (WxDxH).49': {'type': 'categorical'},
        'Dimensions (WxDxH).49.unit': {'type': 'categorical'},
        'Internal mass storage.588': {'type': 'categorical'},
        'Internal mass storage.588.unit': {'type': 'categorical'},
        'Processor cores.6089': {'type': 'categorical'},
        'Processor cores.6089.unit': {'type': 'categorical'},
        'Built-in processor.7787': {'type': 'boolean'},
        'Built-in processor.7787.unit': {'type': 'categorical'}
    },
    'constraints': [],
    'primary_key': 'id'
}

My input dataframe to create the above metadata structure looks like below:

enter image description here

So far I have attempted the following code to do this:

metadata = {}

metadata['fields'] = {}

for col_name in table_dtypes['index']:
    metadata['fields'][col_name] = {}

The code above outputs this:

enter image description here

As you can see, my values are {}. I want to fill them in with 'type' as the key and the associated value will be column "0" in my screen shot dataframe (i.e. object, boolean). I want this to be an automated looping process because in the future I will have 'float', 'integer' and other data types in addition to 'categorical'.

How do I go about doing this?

CodePudding user response:

You can convert your 0 column into an array of dicts, using replace to replace 'object' with 'categorical' (and any others you might need)) and then applying a function to return a dict with type as the key and that value. You can then zip that to the index column and convert to a dict of fields. Finally you can add that to an output dict with the constraints and primary_id properties.

types = df['0'].replace({ 'object' : 'categorical' }).apply(lambda x:{ 'type' : x })
fields = dict(zip(df['index'], types))
metadata = { 'fields' : fields, 
             'constraints': [],
             'primary_key': df.loc[0, 'index']
           }

Note I've assumed the primary_key is stored in the first row of index, if that is not the case, you can adjust as required.

For this sample input:

                     index        0
0                       id   object
1                     name   object
2              category_id   object
3           category_label   object
4  Built-in processor.7787  boolean
5                  A float    float
6               An integer      int

You will get this output (dumped as JSON for readability):

{
    "fields": {
        "id": {
            "type": "categorical"
        },
        "name": {
            "type": "categorical"
        },
        "category_id": {
            "type": "categorical"
        },
        "category_label": {
            "type": "categorical"
        },
        "Built-in processor.7787": {
            "type": "boolean"
        },
        "A float": {
            "type": "float"
        },
        "An integer": {
            "type": "int"
        }
    },
    "constraints": [],
    "primary_key": "id"
}
  • Related