Convert json file to dataframe and remove whitespaces and newlines from value-CodePudding

I have a json file a.json that has the following structure:

[
    { "name":"\n     John\n        ", "age":  "30  \n ","car":"   Bmw   \n   \n" },
    { "name":"\n     Joe\n        ", "age":  "20  \n ","car":"    mercedes   \n   \n" },
    { "name":"\n     Alex\n        ", "age":  "18  \n ","car":"      tesla   \n   \n" }
]

I want to strip off all the whitespaces and newlines for every value. Here is my code:

df = pd.read_json('a.json')
df= df.replace(r'\n','',regex=True)

i removed the the newline but not the whitespaces even though that i write

df.columns=df.columns.str.replace(' ','')
df.columns=df.columns.str.strip()
df.columns=df.columns.str.lstrip()

My output:

                name  age                 car
0       John           30           Bmw
1        Joe           20      mercedes
2       Alex           18         tesla

How can i do that please?

CodePudding user response：

@chitown88's answer is probably faster, but if you want to do it using regex you can do it like that:

df.replace('(^\s |\s $)', '', regex=True, inplace=True)

Output:

   name  age       car
0  John   30       Bmw
1   Joe   20  mercedes
2  Alex   18     tesla

CodePudding user response：

You can use pandas applymap function to iterate all value

import pandas as pd

df = pd.read_json('a.json')
df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print(df)

Output :

   name  age       car
0  John   30       Bmw
1   Joe   20  mercedes
2  Alex   18     tesla

CodePudding user response：

An other quite similar but bit more compact way would be:

import pandas as pd

df = pd.read_json("a.json")

df_obj = df.select_dtypes(['object'])
df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())

print(df)

Output:

   name  age       car
0  John   30       Bmw
1   Joe   20  mercedes
2  Alex   18     tesla

CodePudding user response：

One option is to use list and dictionary comprehension to clean up the json itself:

import pandas as pd

data = [
    { "name":"\n     John\n        ", "age":  "30  \n ","car":"   Bmw   \n   \n" },
    { "name":"\n     Joe\n        ", "age":  "20  \n ","car":"    mercedes   \n   \n" },
    { "name":"\n     Alex\n        ", "age":  "18  \n ","car":"      tesla   \n   \n" }
]


data = [{k:v.strip() for k,v in each.items()} for each in data]
df = pd.DataFrame(data)

Or you could just iterate through each column:

data = [
    { "name":"\n     John\n        ", "age":  "30  \n ","car":"   Bmw   \n   \n" },
    { "name":"\n     Joe\n        ", "age":  "20  \n ","car":"    mercedes   \n   \n" },
    { "name":"\n     Alex\n        ", "age":  "18  \n ","car":"      tesla   \n   \n" }
]


df = pd.DataFrame(data)
for col in df.columns:
    df[col] = df[col].str.strip()

Output:

print(df)
   name age       car
0  John  30       Bmw
1   Joe  20  mercedes
2  Alex  18     tesla