I got the following data frame:
rowid url text domain_id domain_id_label ... punsafe pwatermark aesthetic hash __index_level_0__
0 1 https://cdn.idahopotato.com/cache/4075b86c99bc... Fattoush Salad with Roasted Potatoes 1 cdn.idahopotato.com ... 0.000019 0.042545 6.098400 -7769833748550554891 113
1 2 https://lh3.googleusercontent.com/-Gw5LBM0zYU8... an analysis of self portrayal in novels by vir... 2 lh3.googleusercontent.com ... 0.000002 0.405680 6.109017 8675719636262469033 877
2 3 https://www.mediaplaynews.com/wp-content/uploa... Christmas Comes Early to U.K. Weekly Home Ente... 3 www.mediaplaynews.com ... 0.023502 0.408992 6.023093 -510709293545570516 952
3 4 https://statesofincarceration.org/sites/defaul... Amy Garcia Wikipedia a legacy of reform: dorot... 4 statesofincarceration.org ... 0.000006 0.155641 6.431951 7982521258241828259 1163
4 5 https://cdn.shopify.com/s/files/1/0094/8653/26... 3D Metal Cornish Harbour Painting 5 cdn.shopify.com ... 0.000008 0.109816 6.167709 -2541341491343729392 1431
.. ... ... ... ... ... ... ... ... ... ... ...
995 996 https://i.pinimg.com/736x/c6/35/8e/c6358ecfe2e... Fashion Photography vs Amazing Interiors // Mo... 24 i.pinimg.com ... 0.777287 0.157218 6.396332 -9073600318089725879 171799
996 997 https://www.twi-ny.com//wp-content/uploads/201... Takashi Miike riffs on multiple genres in the ... 594 www.twi-ny.com ... 0.015503 0.081062 6.120159 4126112080526841162 172272
997 998 https://us.123rf.com/450wm/nyul/nyul1405/nyul1... Portrait of happy casual caucasian married cou... 16 us.123rf.com ... 0.881655 0.343428 6.009459 9208056874965420704 172375
998 999 https://t3.ftcdn.net/jpg/00/65/41/20/240_F_654... Idyllic summer landscape with mountain lake an... 64 t3.ftcdn.net ... 0.000010 1.000000 6.374364 4701612357070778743 173088
999 1000 https://i.pinimg.com/736x/8b/5f/56/8b5f565710c... Beards change everything. Colin Morgan is not ... 24 i.pinimg.com ... 0.020406 0.222567 6.241051 -8544261063483623093 173506
[1000 rows x 13 columns]
The url
column are the URLs of images I want to download. This is my code:
import pandas as pd
import requests
counter = 0
data = pd.read_csv('data.csv')
df = pd.DataFrame(data)
urls = df['url'].tolist()
print(urls)
for i in urls:
img_data = requests.get(i).content
with open('image_' str(counter) '.jpg', 'wb') as handler:
handler.write(img_data)
Right now, what this code does is convert df['url']
to a list and download every single image from that URL.
What I want to do instead is:
- Iterate through every entry of
df['url']
- Download the image from that url
- Rename the image to
image_i.jpg
- Rename the corresponding
df['url']
url to the path of that image (they'll be in the same folder so just the image name)
How can I go about doing it this way?
CodePudding user response:
You can write a custom function for this and call it in df.apply:
Following is a working example with dummy data:
def download_url(row):
img_data = requests.get(row["url"]).content
with open(f"/content/sample_data/tmp/image_{row.name}.jpg", "wb") as handler:
handler.write(img_data)
return f"image_{row.name}.jpg"
#
df["url"] = df.apply(lambda row: download_url(row), axis=1)
[Out]:
rowid url
0 1 image_0.jpg
1 2 image_1.jpg
2 3 image_2.jpg
Dummy dataset used:
data=[
[1,"https://www.python.org/static/community_logos/python-logo.png"],
[2,"https://www.python.org/static/community_logos/python-powered-w-100x40.png"],
[3,"https://www.python.org/static/community_logos/python-powered-h-50x65.png"]
]
columns = ["rowid","url"]
df = pd.DataFrame(data=data, columns=columns)