I need to split the content of a pandas column that contain some 'structured' data in many other columns based on content.
The strucure is "column_name1"/"value1"/"column_name2"/"value2"/...
For instance, the word "subscriptions" will become the name of column, and the "sub-id", "sub-id2" will be the values.
Transform this:
ResourceID |
---|
/subscriptions/sub-id/resourceGroups/rg-name/providers/Microsoft.MachineLearningServices/workspaces/work-ml/providers/Microsoft.EventGrid/extensionTopics/default |
/subscriptions/sub-id2/resourceGroups/rg-name2/providers/Microsoft.Sql/servers/name-sqlserver/databases/name-BD |
Into this:
subscriptions | resourceGroups | providers | workspaces | servers | providers | extensionTopics | databases |
---|---|---|---|---|---|---|---|
sub-id | rg-name | Microsoft.MachineLearningServices | work-ml | NaN | Microsoft.EventGrid | default | NaN |
sub-id2 | rg-name2 | Microsoft.Sql | NaN | name-sqlserver | NaN | NaN | name-BD |
Any help would be very apreciate.
CodePudding user response:
Each row looks like: /key1/val1/key2/val2/...
so split each part, zip keys and values then create a dict. Finally, use pd.DataFrame.from_records
to create your expected dataframe:
data = df['ResourceID'].str.strip('/').str.split('/') \
.apply(lambda x: dict(zip(x[::2], x[1::2])))
out = pd.DataFrame.from_records(data)
print(out)
# Output
subscriptions resourceGroups providers workspaces extensionTopics servers databases
0 sub-id rg-name Microsoft.EventGrid work-ml default NaN NaN
1 sub-id2 rg-name2 Microsoft.Sql NaN NaN name-sqlserver name-BD
CodePudding user response:
As an alternative to the other answer by @Corralien, you could do this using split
's expand
parameter:
df = df["ResourceID"].str.split("/", expand=True)
out = df[df.columns[2::2]]
out.columns = df.loc[0, df.columns[1::2]]