I have data like this. What I am trying to do is to create a rule, based on domain names for my project. I want to create a new column named new_url based on domains. If it contains .cdn. it will take the string before .cdn. , otherwise it will call url parser library and parse url in another way. The problem is that in the csv file I created (cleanurl.csv) , there is no new_url column created. When I print parsed urls in code, I can see them. If and else condition are working. Could you help me please ?
import pandas as pd
import url_parser
from url_parser import parse_url,get_url,get_base_url
import numpy as np
df = pd.read_csv("C:\\Users\\myuser\\Desktop\\raw_data.csv", sep=';')
i=-1
for x in df['domain']:
i=i 1
print("*",x,"*")
if '.cdn.' in x:
parsed_url=x.split('.cdn')[0]
print(parsed_url)
df.iloc[i]['new_url']=parsed_url
else:
parsed_url=get_url(x).domain '.' get_url(x).top_domain
print(parsed_url)
df.iloc[i]['new_url']=parsed_url
df.to_csv("C:\\Users\\myuser\\Desktop\\cleanurl.csv", sep=';')
CodePudding user response:
Use .loc[row, 'column']
to create new column
for idx, x in df['domain'].items():
if '.cdn.' in x:
df.loc[idx, 'new_url'] = parsed_url
else:
df.loc[idx, 'new_url'] = parsed_url