Home > Mobile >  Iterate values over a new column
Iterate values over a new column

Time:10-06

I have a webpage which I would like to extract and store it's values into separate columns. Furthermore, I want to extract the movie title and insert it as a new column, but it must iterate over the rows of which the elements from the title were collected.

For example (expected output):

                                    Location Name   Latitude   Longitude  \
0                1117 Broadway (Gil's Music Shop)  47.252495 -122.439644   
1   2715 North Junett St (Kat and Bianca's House)  47.272591 -122.474480   
2                                   Aurora Bridge  47.646713 -122.347435   
3                        Buckaroo Tavern (closed)  47.657841 -122.350327    
                                       movie  
0   10-things-i-hate-about-you-locations-250  
1   10-things-i-hate-about-you-locations-250  
2   10-things-i-hate-about-you-locations-250  
3   10-things-i-hate-about-you-locations-250  
.
.
.

What I have tried:

url = ['https://www.latlong.net/location/10-cloverfield-lane-locations-553',
 'https://www.latlong.net/location/10-things-i-hate-about-you-locations-250',
 'https://www.latlong.net/location/12-angry-men-locations-818']

url_test = []
for i in range(0, len(test), 1):
    df = pd.read_html(test[i])[0]
    df['movie'] = test[i].split('/')[-1]

However, this gives only the output:

                Location Name   Latitude  Longitude  \
0               New York City  40.742298 -73.982559   
1  New York County Courthouse  40.714310 -74.001930   

                        movie  
0  12-angry-men-locations-818  
1  12-angry-men-locations-818 

Which is missing the rest of the results

I get the feeling it's because the data is split in the pandas dataframe, so I have tried merging before appending the columns using:

url_test = []
for i in range(0, len(test), 1):
    df = pd.read_html(test[i])[0]
    df = pd.merge(df, how='inner')
    df['movie'] = test[i].split('/')[-1]

But I get the following error:

TypeError: merge() missing 1 required positional argument: 'right'

CodePudding user response:

Try:

test = ['https://www.latlong.net/location/10-cloverfield-lane-locations-553',
 'https://www.latlong.net/location/10-things-i-hate-about-you-locations-250',
 'https://www.latlong.net/location/12-angry-men-locations-818']

url_test = []
for i in range(0, len(test), 1):
    df = pd.read_html(test[i])[0]
    df['movie'] = test[i].split('/')[-1]
    url_test.append(df)

final_df = pd.concat(url_test, ignore_index=True)
print(final_df)
  • Related