Split column based on delimiter and considering the content-CodePudding

I need to split the content of a pandas column that contain some 'structured' data in many other columns based on content.

The strucure is "column_name1"/"value1"/"column_name2"/"value2"/...

For instance, the word "subscriptions" will become the name of column, and the "sub-id", "sub-id2" will be the values.

Transform this:

ResourceID
/subscriptions/sub-id/resourceGroups/rg-name/providers/Microsoft.MachineLearningServices/workspaces/work-ml/providers/Microsoft.EventGrid/extensionTopics/default
/subscriptions/sub-id2/resourceGroups/rg-name2/providers/Microsoft.Sql/servers/name-sqlserver/databases/name-BD

Into this:

subscriptions	resourceGroups	providers	workspaces	servers	providers	extensionTopics	databases
sub-id	rg-name	Microsoft.MachineLearningServices	work-ml	NaN	Microsoft.EventGrid	default	NaN
sub-id2	rg-name2	Microsoft.Sql	NaN	name-sqlserver	NaN	NaN	name-BD

Any help would be very apreciate.

CodePudding user response：

Each row looks like: /key1/val1/key2/val2/... so split each part, zip keys and values then create a dict. Finally, use pd.DataFrame.from_records to create your expected dataframe:

data = df['ResourceID'].str.strip('/').str.split('/') \
                       .apply(lambda x: dict(zip(x[::2], x[1::2])))
out = pd.DataFrame.from_records(data)
print(out)

# Output
  subscriptions resourceGroups            providers workspaces extensionTopics         servers databases
0        sub-id        rg-name  Microsoft.EventGrid    work-ml         default             NaN       NaN
1       sub-id2       rg-name2        Microsoft.Sql        NaN             NaN  name-sqlserver   name-BD

CodePudding user response：

As an alternative to the other answer by @Corralien, you could do this using split's expand parameter:

df = df["ResourceID"].str.split("/", expand=True)
out = df[df.columns[2::2]]
out.columns = df.loc[0, df.columns[1::2]]