So I'm using pandas and requests to scrape IP's from https://free-proxy-list.net/ but how do I cover this code
import pandas as pd
resp = requests.get('https://free-proxy-list.net/')
df = pd.read_html(resp.text)[0]
df = (df[(df['Anonymity'] == 'elite proxy')])
print(df.to_string(index=False))
so that the output is list of IP's without anything else. I managed to remove index and only added elite proxy but I can't make a variable that is a list with only IP's and without index.
CodePudding user response:
To get the contents of the 'IP Address' column, subset to the 'IP address' column and use .to_list()
.
Here's how:
print(df['IP Address'].to_list())
CodePudding user response:
You can use loc
to slice directly the column for the matching rows, and to_list
to convert to list:
df.loc[df['Anonymity'].eq('elite proxy'), 'IP Address'].to_list()
output: ['134.119.xxx.xxx', '173.249.xxx.xxx'...]
CodePudding user response:
It looks like you are trying to accomplish something like below:
print(df['IP Address'].to_string(index=False))
Also It would be a good idea, after filtering your dataframe to reset its index like below:
df = df.reset_index(drop=True)
So the code snippet would be something like this:
import pandas as pd
resp = requests.get('https://free-proxy-list.net/')
df = pd.read_html(resp.text)[0]
df = (df[(df['Anonymity'] == 'elite proxy')])
df = df.reset_index(drop=True)
print(df['IP Address'].to_string(index=False))