I am working with a large dataframe that currently looks like this:
ID | Region | Noun |
---|---|---|
1 | North America | [('Everything', 'NP'), ('the product','NP'), ('it','NP')] |
2 | North America | [('beautiful product','NP')] |
The third column contains noun phrase(s) and each phrases have labels (two capital letters). All of these are in quotation marks and then coupled in brackets, and then put in square brackets which make it difficult for me to separate and pivot.
I would like to pivot the last column only, so the final output will look like this:
ID | Region | Noun | Type |
---|---|---|---|
1 | North America | everything | NP |
1 | North America | the product | NP |
1 | North America | it | NP |
2 | North America | beautiful product | NP |
The annoying part is that some rows have more brackets than others. Is there any way I could make this happen on Python?
CodePudding user response:
I'm assuming that you have strings in column Noun
. You can apply ast.literal_eval
on them to convert them to Python lists:
from ast import literal_eval
# apply if Noun are strings, skip otherwise
df["Noun"] = df["Noun"].apply(literal_eval)
df = df.explode("Noun")
df[["Noun", "Type"]] = df["Noun"].apply(pd.Series)
print(df)
Prints:
ID Region Noun Type
0 1 North America Everything NP
0 1 North America the product NP
0 1 North America it NP
1 2 North America beautiful product NP
CodePudding user response:
Here is another way:
df2 = df.explode('Noun').reset_index(drop=True)
df2[['ID','Region']].join(pd.DataFrame(df2['Noun'].tolist(),columns = ['Noun','Type']))
Output:
ID Region Noun Type
0 1 North America Everything NP
1 1 North America the product NP
2 1 North America it NP
3 2 North America beautiful product NP