Python pivot column with comma and bracket separated-CodePudding

I am working with a large dataframe that currently looks like this:

ID	Region	Noun
1	North America	[('Everything', 'NP'), ('the product','NP'), ('it','NP')]
2	North America	[('beautiful product','NP')]

The third column contains noun phrase(s) and each phrases have labels (two capital letters). All of these are in quotation marks and then coupled in brackets, and then put in square brackets which make it difficult for me to separate and pivot.

I would like to pivot the last column only, so the final output will look like this:

ID	Region	Noun	Type
1	North America	everything	NP
1	North America	the product	NP
1	North America	it	NP
2	North America	beautiful product	NP

The annoying part is that some rows have more brackets than others. Is there any way I could make this happen on Python?

CodePudding user response：

I'm assuming that you have strings in column Noun. You can apply ast.literal_eval on them to convert them to Python lists:

from ast import literal_eval

# apply if Noun are strings, skip otherwise
df["Noun"] = df["Noun"].apply(literal_eval)

df = df.explode("Noun")
df[["Noun", "Type"]] = df["Noun"].apply(pd.Series)
print(df)

Prints:

   ID         Region               Noun Type
0   1  North America         Everything   NP
0   1  North America        the product   NP
0   1  North America                 it   NP
1   2  North America  beautiful product   NP

CodePudding user response：

Here is another way:

df2 = df.explode('Noun').reset_index(drop=True)
df2[['ID','Region']].join(pd.DataFrame(df2['Noun'].tolist(),columns = ['Noun','Type']))

Output:

   ID         Region               Noun Type
0   1  North America         Everything   NP
1   1  North America        the product   NP
2   1  North America                 it   NP
3   2  North America  beautiful product   NP