Home > Software design >  Scraping free proxy list with pandas
Scraping free proxy list with pandas

Time:02-21

So I'm using pandas and requests to scrape IP's from https://free-proxy-list.net/ but how do I cover this code

import pandas as pd

resp = requests.get('https://free-proxy-list.net/')
df = pd.read_html(resp.text)[0]

df = (df[(df['Anonymity'] == 'elite proxy')])

print(df.to_string(index=False))

so that the output is list of IP's without anything else. I managed to remove index and only added elite proxy but I can't make a variable that is a list with only IP's and without index.

CodePudding user response:

To get the contents of the 'IP Address' column, subset to the 'IP address' column and use .to_list().

Here's how:

print(df['IP Address'].to_list())

CodePudding user response:

You can use loc to slice directly the column for the matching rows, and to_list to convert to list:

df.loc[df['Anonymity'].eq('elite proxy'), 'IP Address'].to_list()

output: ['134.119.xxx.xxx', '173.249.xxx.xxx'...]

CodePudding user response:

It looks like you are trying to accomplish something like below:

print(df['IP Address'].to_string(index=False))

Also It would be a good idea, after filtering your dataframe to reset its index like below:

df = df.reset_index(drop=True)

So the code snippet would be something like this:

import pandas as pd

resp = requests.get('https://free-proxy-list.net/')
df = pd.read_html(resp.text)[0]

df = (df[(df['Anonymity'] == 'elite proxy')])
df = df.reset_index(drop=True)
print(df['IP Address'].to_string(index=False))
  • Related