Home > Enterprise >  Python pivot column with comma and bracket separated
Python pivot column with comma and bracket separated

Time:04-12

I am working with a large dataframe that currently looks like this:

ID Region Noun
1 North America [('Everything', 'NP'), ('the product','NP'), ('it','NP')]
2 North America [('beautiful product','NP')]

The third column contains noun phrase(s) and each phrases have labels (two capital letters). All of these are in quotation marks and then coupled in brackets, and then put in square brackets which make it difficult for me to separate and pivot.

I would like to pivot the last column only, so the final output will look like this:

ID Region Noun Type
1 North America everything NP
1 North America the product NP
1 North America it NP
2 North America beautiful product NP

The annoying part is that some rows have more brackets than others. Is there any way I could make this happen on Python?

CodePudding user response:

I'm assuming that you have strings in column Noun. You can apply ast.literal_eval on them to convert them to Python lists:

from ast import literal_eval

# apply if Noun are strings, skip otherwise
df["Noun"] = df["Noun"].apply(literal_eval)

df = df.explode("Noun")
df[["Noun", "Type"]] = df["Noun"].apply(pd.Series)
print(df)

Prints:

   ID         Region               Noun Type
0   1  North America         Everything   NP
0   1  North America        the product   NP
0   1  North America                 it   NP
1   2  North America  beautiful product   NP

CodePudding user response:

Here is another way:

df2 = df.explode('Noun').reset_index(drop=True)
df2[['ID','Region']].join(pd.DataFrame(df2['Noun'].tolist(),columns = ['Noun','Type']))

Output:

   ID         Region               Noun Type
0   1  North America         Everything   NP
1   1  North America        the product   NP
2   1  North America                 it   NP
3   2  North America  beautiful product   NP
  • Related