I need help with data cleaning. How do I make the language column(D) into form in column(F) Basically just get rid of the brackets and apostrophe, and save the comma between each language. It can be done either using python or excel itself Thanks!
I tried to google but it didn't work
CodePudding user response:
Based on the image, the data type in column 'D' seems to be a list of elements, hence when written to the cells, the square brackets are included.
Their is a simple trick to handle this: While you iterate over the column and the row (basically the cell), simply run a list comprehension preceded by a join, something like the following
''.join[_val for _val in str(_cell_val) if str(_val) not in ['[', ']', '\'']
search for list to str conversion using list comprehension and you should have enough samples.
CodePudding user response:
You could use pandas to read the csv into a dataframe then "apply" a function to the column that did something like this:
def clean(value: str) -> str:
for c in "[']":
value = value.replace(c, "")
return value
It's worth noting if you remove the brackets you'll have an "embedded" delimiter meaning you wont be able to save this as a "csv" without a few headaches.
If you want to clean it up in Excel you could, similarly to my formula above, add a formula that replaces all of the unwanted characters with an empty string with something like this:
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(D2,"'",""),"[",""),"]","")
*where D2 is the first language cell
FWIW, I'd take a few minutes to play with this in pandas too -- it's always good to pick up a new skill