I have a dataframe with several columns, two of which are strings of URIs with a final fragment such as:
http://company.com/information#name
http://company.com/information#Company
where I need to keep only "name" and "Company" URI fragments, and remove the string before the pound.
I have written the following function to do so on a passed dataframe , also passing a list of column names to act upon, and finally the string to remove from each of them:
def uri_fragment(DF: pd.DataFrame, COLUMN_LIST: list, URI_STRING: str) -> pd.DataFrame:
for DF_COLUMN in COLUMN_LIST:
DF['DF_COLUMN'] = DF['DF_COLUMN'].map(lambda x: x.replace(URI_STRING,''))
return DF
which I invoke as:
my_df = uri_fragment(my_df, ['class', 'type'], "http://company.com/information#")
to get the "class" and "type" dataframe columns cleaned up of the passed URI string.
but get the following error:
KeyError: 'DF_COLUMN'
What am I overlooking/misunderstanding? Thank you
CodePudding user response:
You're passing in the string 'DF_COLUMN' as a key, rather than the variable DF_COLUMN from your loop. Since there is no column named 'DF_COLUMN', pandas is throwing a KeyError.
CodePudding user response:
You are using a literal string in your function. You should remove the quotes:
DF[DF_COLUMN] = DF[DF_COLUMN].…
That said, a simpler method would be to use a regex. map
will be quite slow:
for col in ['col', 'col2']:
# here extracting any terminal fragment. You could also use
# f'{URI_STRING}([^#] )$' for limited matching
df[col] = df[col].str.extract('#([^#] )$', expand=False)
Also, another critic of your code, you are both returning DF
and modifying it in place. You should do only one of the two.
Either don't return anything and modify in place, or return a new dataframe. For the second option, make a copy of DF
by adding DF = DF.copy()
in the beginning of the function.