Home > Software design >  Each row in DataFrame column is a list. How to remove leading whitespace from second to end entries
Each row in DataFrame column is a list. How to remove leading whitespace from second to end entries

Time:08-28

I have a dataset that has a "tags" column in which each row is a list of tags. For example, the first entry looks something like this

df['tags'][0]

result = "[' Leisure Trip ', ' Couple ', ' Duplex Double Room ', ' Stayed 6 nights ']"

I have been able to remove the trailing whitespace from all elements and only the leading whitespace from the first element (so I get something like the below).

['Leisure trip', ' Couple', ' Duplex Double Room', ' Stayed 6 nights']

Does anyone know how to remove the leading whitespace from all but the first element is these lists? They are not of uniform length or anything. Below is the code I have used to get the final result above:

clean_tags_list = []
for item in reviews['Tags']:
    string = item.replace("[", "")
    string2 = string.replace("'", "")
    string3 = string2.replace("]", "")
    string4 = string3.replace(",", "")
    string5 = string4.strip()
    string6 = string5.lstrip()
    #clean_tags_list.append(string4.split(" "))
    clean_tags_list.append(string6.split("  "))
clean_tags_list[0]


['Leisure trip', ' Couple', ' Duplex Double Room', ' Stayed 6 nights']

CodePudding user response:

IIUC you want to apply strip for the first element and right strip for the other ones. Then, first convert your 'string list' to an actual list with ast.literal_eval and apply strip and rstrip:

from ast import literal_eval
df.tags.agg(literal_eval).apply(lambda x: [item.strip() if x.index(item) == 0 else item.rstrip() for item in x])

CodePudding user response:

If I understand correctly, you can use the code below :

import pandas as pd

df = pd.DataFrame({'tags': [[' Leisure Trip ', ' Couple ', ' Duplex Double Room ', ' Stayed 6 nights ']]})

df['tags'] = df['tags'].apply(lambda x: [x[0].strip()]   [e.rstrip() for e in x[1:]])

>>> print(df)

enter image description here

CodePudding user response:

I was also able to figure it out with the below code. (I know that this isn't very efficient but it worked).

will_clean_tag_list = []
for row in clean_tags_list:
    for col in range(len(row)):
        row[col] = row[col].strip()
    will_clean_tag_list.append(row)

Thank you all for the insight! This has been my first post and I really appreciate the help.

  • Related