Removing [' and '] from CSV-CodePudding

I have several GB of CSV files where values in one of the columns look like this:

Which is a consequence of this:

urls.append(re.findall(r'http\S ', hashtags_rem))
...
merger = {'Content': clean, 'AttrURL': urls}
cleandf = pd.DataFrame(merger)
...
df.insert(3, "AssocURL", cleandf['AttrURL'])

It took me a while to generate these files and, looking back, I'd certainly write this part differently, but doing it again is a very time-consuming and simply unnecessary endeavour.

Is there another efficient way to remove [' and '] from this column using pandas or csv?

CodePudding user response：

You can use pandas.DataFrame.apply to remove the squared parentheses. It should be something like this:

df.apply(lambda string: string[2:-2])

CodePudding user response：

Not a very attractive answer, but how about just with .str.replace ?

df['AssocURL'].str.replace("\'","").str.replace("[","").str.replace("]","")

CodePudding user response：

From the question it's unclear if the column is a string or if it contains elements which are themselves a list which contains a single string. re.findall returns the second option. If it is the second option eg.,

df = pd.DataFrame({'AssocURL': [['link1'], ['link2']]})
#   AssocURL
# 0  [link1]
# 1  [link2]

You can use explode:

df['AssocURL'] = df['AssocURL'].explode()
#   AssocURL
# 0    link1
# 1    link2

CodePudding user response：

Super simple, just do:

df['AssocURL'].replace("['", '').replace("']", '')