Hello this is my first Question and I tried everything.
I'm new to Dataframe and SQL and I have an excel table with multiple Movies. It has the Columns Movie Name, Year ,...,and Actors. In the Column Actors are multiple Actors listet, which are separated by , or by /. I created a Dataframe Movies which have a UUID, and all the information I should store for a Movie. I also filtered all Actors which are in all my Movies and created a DataFrame called Actors with also a Unique ID and Name and stored them individually.
DF_MOVIE:
Movie ID | Actors |
---|---|
x | Person1,Person2,Person5 |
y | Person1 |
DF_ACTORS:
ActorID | Actor |
---|---|
a | Person1 |
b | Person 5 |
What I want to create is a new Dataframe which shows the Relationship between them.
For Example:
DF_ACTORS_MOVIE:
MOVIEID | ACTORID |
---|---|
x | a |
x | b |
The Problem is: there Actors in my Excel table (and Bookdf). And I want to take both IDS.
Does anybody have an Idea? would appreciate, Thanks..:
I tried iterating both the Dataframe books and tried to create a new Entry on the DF_ACTORS_MOVIE Dataframe with both UUIDS. The Problem is, that there are multiple Actors and they are divided by , and by /.
CodePudding user response:
You can first explode
the Actors into a new column:
out = df_actors.merge(df_movie.assign(Actor=df_movie['Actors'].str.split(','))
.explode('Actor').drop(columns='Actors')
)
Output:
ActorID Actor Movie ID
0 a Person1 x
1 a Person1 y
2 b Person5 x