Removing spaces from a nested list of objects with pandas-CodePudding

I have a dataframe with the following structure:

[
    {
        "name": "Book1",
        "details": [
            {
                "id": 30278752,
                "isbn": " 1594634025",
                "isbn13": "9781594634024    ",
                "text_reviews_count": 417,
                "work_reviews_count": 3313007,
                "work_text_reviews_count": 109912,
                "average_rating": " 3.92"
            }
        ]
    },
    {
        "name": "Book2",
        "details": [
            {
                "id": 34006942,
                "isbn": "  1501173219",
                "isbn13": " 9781501173219   ",
                "text_reviews_count": 565,
                "work_reviews_count": 2142280,
                "work_text_reviews_count": 75053,
                "average_rating": "4.33 "
            }
        ]
    }
]

How can i remove whitespaces (there are spaces after and before the string) please!

CodePudding user response：

names = ["isbn","isbn13" ,"average_rating"]
for i in v:
    for j in names:
        i["details"][0][j] = i["details"][0][j].replace(" ","")

You can try to replace them this way.

Out[17]: 
[{'name': 'Book1',
  'details': [{'id': 30278752,
    'isbn': '1594634025',
    'isbn13': '9781594634024',
    'text_reviews_count': 417,
    'work_reviews_count': 3313007,
    'work_text_reviews_count': 109912,
    'average_rating': '3.92'}]},
 {'name': 'Book2',
  'details': [{'id': 34006942,
    'isbn': '1501173219',
    'isbn13': '9781501173219',
    'text_reviews_count': 565,
    'work_reviews_count': 2142280,
    'work_text_reviews_count': 75053,
    'average_rating': '4.33'}]}]

here is the output.V is the list you have mentioned

CodePudding user response：

You should place json data into columns. Here is my approach:

# Pull first element from list
df.loc[:, 'spread_details'] = df.details.map(lambda x: x[0])
# Spread JSON into columns
df1 = pd.json_normalize(df['spread_details'])


merged_df = pd.concat([df, df1], axis=1)
merged_df = merged_df.drop(columns=['details', 'spread_details'])

# Trim required columns
cols_to_be_trimmed = ["isbn", "isbn13"]
merged_df[cols] = merged_df[cols_to_be_trimmed].apply(lambda x: x.str.strip())

merged_df.head(2)

CodePudding user response：

We can create a lambda function to strip the spaces from string values in dictionary, then map this function on the details column of dataframe:

strip = lambda d: {k: v.strip() if isinstance(v, str) else v for k, v in d.items()}
df['details'] = df['details'].map(lambda L: [strip(d) for d in L])

Result

>>> df.to_dict('r')

[{'name': 'Book1',
  'details': [{'id': 30278752,
    'isbn': '1594634025',
    'isbn13': '9781594634024',
    'text_reviews_count': 417,
    'work_reviews_count': 3313007,
    'work_text_reviews_count': 109912,
    'average_rating': '3.92'}]},
 {'name': 'Book2',
  'details': [{'id': 34006942,
    'isbn': '1501173219',
    'isbn13': '9781501173219',
    'text_reviews_count': 565,
    'work_reviews_count': 2142280,
    'work_text_reviews_count': 75053,
    'average_rating': '4.33'}]}]