Home > database >  Split column based on delimiter and considering the content
Split column based on delimiter and considering the content

Time:01-04

I need to split the content of a pandas column that contain some 'structured' data in many other columns based on content.

The strucure is "column_name1"/"value1"/"column_name2"/"value2"/...

For instance, the word "subscriptions" will become the name of column, and the "sub-id", "sub-id2" will be the values.

Transform this:

ResourceID
/subscriptions/sub-id/resourceGroups/rg-name/providers/Microsoft.MachineLearningServices/workspaces/work-ml/providers/Microsoft.EventGrid/extensionTopics/default
/subscriptions/sub-id2/resourceGroups/rg-name2/providers/Microsoft.Sql/servers/name-sqlserver/databases/name-BD

Into this:

subscriptions resourceGroups providers workspaces servers providers extensionTopics databases
sub-id rg-name Microsoft.MachineLearningServices work-ml NaN Microsoft.EventGrid default NaN
sub-id2 rg-name2 Microsoft.Sql NaN name-sqlserver NaN NaN name-BD

Any help would be very apreciate.

CodePudding user response:

Each row looks like: /key1/val1/key2/val2/... so split each part, zip keys and values then create a dict. Finally, use pd.DataFrame.from_records to create your expected dataframe:

data = df['ResourceID'].str.strip('/').str.split('/') \
                       .apply(lambda x: dict(zip(x[::2], x[1::2])))
out = pd.DataFrame.from_records(data)
print(out)

# Output
  subscriptions resourceGroups            providers workspaces extensionTopics         servers databases
0        sub-id        rg-name  Microsoft.EventGrid    work-ml         default             NaN       NaN
1       sub-id2       rg-name2        Microsoft.Sql        NaN             NaN  name-sqlserver   name-BD

CodePudding user response:

As an alternative to the other answer by @Corralien, you could do this using split's expand parameter:

df = df["ResourceID"].str.split("/", expand=True)
out = df[df.columns[2::2]]
out.columns = df.loc[0, df.columns[1::2]]
  •  Tags:  
  • Related