Home > Back-end >  How to iterate and edit values of a column in pandas dataframe
How to iterate and edit values of a column in pandas dataframe

Time:10-28

I got the following data frame:

     rowid                                                url                                               text  domain_id            domain_id_label  ...   punsafe  pwatermark  aesthetic                 hash  __index_level_0__
0        1  https://cdn.idahopotato.com/cache/4075b86c99bc...               Fattoush Salad with Roasted Potatoes          1        cdn.idahopotato.com  ...  0.000019    0.042545   6.098400 -7769833748550554891                113
1        2  https://lh3.googleusercontent.com/-Gw5LBM0zYU8...  an analysis of self portrayal in novels by vir...          2  lh3.googleusercontent.com  ...  0.000002    0.405680   6.109017  8675719636262469033                877
2        3  https://www.mediaplaynews.com/wp-content/uploa...  Christmas Comes Early to U.K. Weekly Home Ente...          3      www.mediaplaynews.com  ...  0.023502    0.408992   6.023093  -510709293545570516                952
3        4  https://statesofincarceration.org/sites/defaul...  Amy Garcia Wikipedia a legacy of reform: dorot...          4  statesofincarceration.org  ...  0.000006    0.155641   6.431951  7982521258241828259               1163
4        5  https://cdn.shopify.com/s/files/1/0094/8653/26...                  3D Metal Cornish Harbour Painting          5            cdn.shopify.com  ...  0.000008    0.109816   6.167709 -2541341491343729392               1431
..     ...                                                ...                                                ...        ...                        ...  ...       ...         ...        ...                  ...                ...
995    996  https://i.pinimg.com/736x/c6/35/8e/c6358ecfe2e...  Fashion Photography vs Amazing Interiors // Mo...         24               i.pinimg.com  ...  0.777287    0.157218   6.396332 -9073600318089725879             171799
996    997  https://www.twi-ny.com//wp-content/uploads/201...  Takashi Miike riffs on multiple genres in the ...        594             www.twi-ny.com  ...  0.015503    0.081062   6.120159  4126112080526841162             172272
997    998  https://us.123rf.com/450wm/nyul/nyul1405/nyul1...  Portrait of happy casual caucasian married cou...         16               us.123rf.com  ...  0.881655    0.343428   6.009459  9208056874965420704             172375
998    999  https://t3.ftcdn.net/jpg/00/65/41/20/240_F_654...  Idyllic summer landscape with mountain lake an...         64               t3.ftcdn.net  ...  0.000010    1.000000   6.374364  4701612357070778743             173088
999   1000  https://i.pinimg.com/736x/8b/5f/56/8b5f565710c...  Beards change everything. Colin Morgan is not ...         24               i.pinimg.com  ...  0.020406    0.222567   6.241051 -8544261063483623093             173506

[1000 rows x 13 columns]

The url column are the URLs of images I want to download. This is my code:

import pandas as pd
import requests

counter = 0

data = pd.read_csv('data.csv')
df = pd.DataFrame(data)


urls = df['url'].tolist()
print(urls)

for i in urls:
    img_data = requests.get(i).content

    with open('image_' str(counter) '.jpg', 'wb') as handler:
        handler.write(img_data)

Right now, what this code does is convert df['url'] to a list and download every single image from that URL.

What I want to do instead is:

  • Iterate through every entry of df['url']
  • Download the image from that url
  • Rename the image to image_i.jpg
  • Rename the corresponding df['url'] url to the path of that image (they'll be in the same folder so just the image name)

How can I go about doing it this way?

CodePudding user response:

You can write a custom function for this and call it in df.apply:

Following is a working example with dummy data:

def download_url(row):
  img_data = requests.get(row["url"]).content
  with open(f"/content/sample_data/tmp/image_{row.name}.jpg", "wb") as handler:
    handler.write(img_data)
  return f"image_{row.name}.jpg"
# 
df["url"] = df.apply(lambda row: download_url(row), axis=1)

[Out]:
   rowid          url
0      1  image_0.jpg
1      2  image_1.jpg
2      3  image_2.jpg

Dummy dataset used:

data=[
  [1,"https://www.python.org/static/community_logos/python-logo.png"],
  [2,"https://www.python.org/static/community_logos/python-powered-w-100x40.png"],
  [3,"https://www.python.org/static/community_logos/python-powered-h-50x65.png"]
]

columns = ["rowid","url"]

df = pd.DataFrame(data=data, columns=columns)
  • Related