Home > Back-end >  Groupby multiple row and keep two columns in pandas
Groupby multiple row and keep two columns in pandas

Time:06-15

I have dataframe in format name, actor_link, movie_link and I want to group it by actor link to format name, actor_link, list_of_movies_where_actor_appears

example input

Tom Hanks,https://www.csfd.cz/tvurce/330-tom-hanks/,https://www.csfd.cz/film/10135-forrest-gump/
Tom Hanks,https://www.csfd.cz/tvurce/330-tom-hanks/,https://www.csfd.cz/film/2292-zelena-mile/

example output

Tom Hanks,https://www.csfd.cz/tvurce/330-tom-hanks/,[https://www.csfd.cz/film/10135-forrest-gump/,https://www.csfd.cz/film/2292-zelena-mile/]

my code is giving output

Tom Hanks,https://www.csfd.cz/tvurce/330-tom-hanks/,"['Tom Hanks', 'https://www.csfd.cz/film/10135-forrest-gump/', 'Tom Hanks', 'https://www.csfd.cz/film/2292-zelena-mile/']

code

original_actor_df = pd.read_csv('actors.csv')
actor_movies_df = original_actor_df.set_index("actor_link").stack().groupby(level=0).apply(list).reset_index(name="movies")
original_actor_df.drop(['movie_link'], axis=1, inplace=True)
original_actor_df.drop_duplicates(inplace=True)
actor_df = pd.merge(original_actor_df, actor_movies_df, on="actor_link")

How can I get rid of actor name in column with movie links? Can I simplify this process with dataframe without so many steps? I think it could be possible with adjusting

original_actor_df.set_index("actor_link").stack().groupby(level=0).apply(list).reset_index(name="movies")

I tried, I failed ...

CodePudding user response:

Hey there fellow Czech programmer! <3 This might get the job done, i think...

import pandas as pd

df = pd.read_csv('MOCK_DATA.csv')

# group by actor_link and count the number of movies
g = df.groupby(['actor','actor_link'])['movie'].apply(list).reset_index(name='list_of_movies')
# output res to a csv file
g.to_csv('actor_movies.csv', header=False, index=False)

Tested on this file:

actor,actor_link,movie
Yardley McGilroy,www.csfd/Yardley McGilroy.cz,Sharknado 2: The Second One
Woodman Meese,www.csfd/Woodman Meese.cz,The Amazing Catfish
Marijo Thorburn,www.csfd/Marijo Thorburn.cz,"Soldier, The"
Arron Rosenfield,www.csfd/Arron Rosenfield.cz,Barcelona
Olga Wainscot,www.csfd/Olga Wainscot.cz,Addicted
Bradan Ivashev,www.csfd/Bradan Ivashev.cz,Fury
Lu Manjin,www.csfd/Lu Manjin.cz,"Sea Inside, The (Mar adentro)"
Shelby Kitt,www.csfd/Shelby Kitt.cz,"Set-Up, The"
Barbaraanne Yakushkin,www.csfd/Barbaraanne Yakushkin.cz,"Princess Blade, The (Shura Yukihime)"
Abbey Munkton,www.csfd/Abbey Munkton.cz,Twin Peaks: Fire Walk with Me
Vittoria Clayal,www.csfd/Vittoria Clayal.cz,"Cider House Rules, The"
Vickie Ormrod,www.csfd/Vickie Ormrod.cz,Hustler White
Artemas Solomonides,www.csfd/Artemas Solomonides.cz,Satyricon
Lucita Whittick,www.csfd/Lucita Whittick.cz,"Exterminator, The"
Cullen Kear,www.csfd/Cullen Kear.cz,Non-Stop
Tine Slaney,www.csfd/Tine Slaney.cz,Measuring the World (Die Vermessung der Welt)
Allister Caulcott,www.csfd/Allister Caulcott.cz,Beat the Devil
Dannie Sheara,www.csfd/Dannie Sheara.cz,Casanova's Big Night
Montague Casetti,www.csfd/Montague Casetti.cz,Someone Like Him (Einer wie Bruno)
Dara French,www.csfd/Dara French.cz,Gozu (Gokudô kyôfu dai-gekijô: Gozu)
Debby Winterson,www.csfd/Debby Winterson.cz,Dr. Jekyll and Mr. Hyde
Phaedra Eneas,www.csfd/Phaedra Eneas.cz,Edges of the Lord
Obidiah Bastiman,www.csfd/Obidiah Bastiman.cz,Rendezvous
Lindy Lilbourne,www.csfd/Lindy Lilbourne.cz,Night Moves
Devon Obert,www.csfd/Devon Obert.cz,Rabbit Without Ears 2 (Zweiohrküken)
Conrade Urrey,www.csfd/Conrade Urrey.cz,Heavens Fall
Jaine Chasson,www.csfd/Jaine Chasson.cz,Jimi Hendrix: Hear My Train A Comin'
Filberte Southerton,www.csfd/Filberte Southerton.cz,"Capture of Bigfoot, The"
Ulric Hargitt,www.csfd/Ulric Hargitt.cz,Pilgrimage
Gal Pavia,www.csfd/Gal Pavia.cz,Big Hero 6
Niels Dannell,www.csfd/Niels Dannell.cz,More Than a Game
Kari Jobe,www.csfd/Kari Jobe.cz,Jimi: All Is by My Side
Tonia Hatton,www.csfd/Tonia Hatton.cz,Star Trek: Nemesis
Rozele Kaas,www.csfd/Rozele Kaas.cz,Captive (Cautiva) 
Urson Bourdel,www.csfd/Urson Bourdel.cz,RKO 281
Teddi Mohammed,www.csfd/Teddi Mohammed.cz,Canvas
Blair Mosedale,www.csfd/Blair Mosedale.cz,Escape from Fort Bravo
Cleon Sloley,www.csfd/Cleon Sloley.cz,Survival Quest
Yasmin Snap,www.csfd/Yasmin Snap.cz,Strictly Sexual
Audy Rubinfeld,www.csfd/Audy Rubinfeld.cz,Queen of Montreuil
Shepperd Matusiak,www.csfd/Shepperd Matusiak.cz,"Dark Side of the Heart, The (Lado oscuro del corazón, El)"
Storm Harrowing,www.csfd/Storm Harrowing.cz,"Artist, The"
Vlad Geare,www.csfd/Vlad Geare.cz,Pleasure at Her Majesty's
Stacey Kiff,www.csfd/Stacey Kiff.cz,Marilyn in Manhattan
Darla Dongall,www.csfd/Darla Dongall.cz,Hometown Legend
Nathan Lythgoe,www.csfd/Nathan Lythgoe.cz,Tales of Terror
Krishnah Bernet,www.csfd/Krishnah Bernet.cz,Circus of Horrors
Elnore Haggett,www.csfd/Elnore Haggett.cz,"Thing About My Folks, The"

Wasn't able to get rid of the quotation marks because that would break the csv parsing, hope that's not too big of a problem...

CodePudding user response:

I think you can get desired output using group by aggregation as follows.

import pandas as pd
import io   

string = """name,actor_link,movie_link
Tom Hanks,https://www.csfd.cz/tvurce/330-tom-hanks/,https://www.csfd.cz/film/10135-forrest-gump/
Tom Hanks,https://www.csfd.cz/tvurce/330-tom-hanks/,https://www.csfd.cz/film/2292-zelena-mile/"""

df = pd.read_csv(io.StringIO(string), sep=",")
df = df.groupby(['name','actor_link']).agg(list_of_movies={'movie_link':list}).reset_index()
  • Related