I have a json file a.json that has the following structure:
[
{ "name":"\n John\n ", "age": "30 \n ","car":" Bmw \n \n" },
{ "name":"\n Joe\n ", "age": "20 \n ","car":" mercedes \n \n" },
{ "name":"\n Alex\n ", "age": "18 \n ","car":" tesla \n \n" }
]
I want to strip off all the whitespaces and newlines for every value. Here is my code:
df = pd.read_json('a.json')
df= df.replace(r'\n','',regex=True)
i removed the the newline but not the whitespaces even though that i write
df.columns=df.columns.str.replace(' ','')
df.columns=df.columns.str.strip()
df.columns=df.columns.str.lstrip()
My output:
name age car
0 John 30 Bmw
1 Joe 20 mercedes
2 Alex 18 tesla
How can i do that please?
CodePudding user response:
@chitown88's answer is probably faster, but if you want to do it using regex you can do it like that:
df.replace('(^\s |\s $)', '', regex=True, inplace=True)
Output:
name age car
0 John 30 Bmw
1 Joe 20 mercedes
2 Alex 18 tesla
CodePudding user response:
You can use pandas applymap function to iterate all value
import pandas as pd
df = pd.read_json('a.json')
df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print(df)
Output :
name age car
0 John 30 Bmw
1 Joe 20 mercedes
2 Alex 18 tesla
CodePudding user response:
An other quite similar but bit more compact way would be:
import pandas as pd
df = pd.read_json("a.json")
df_obj = df.select_dtypes(['object'])
df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print(df)
Output:
name age car
0 John 30 Bmw
1 Joe 20 mercedes
2 Alex 18 tesla
CodePudding user response:
One option is to use list and dictionary comprehension to clean up the json itself:
import pandas as pd
data = [
{ "name":"\n John\n ", "age": "30 \n ","car":" Bmw \n \n" },
{ "name":"\n Joe\n ", "age": "20 \n ","car":" mercedes \n \n" },
{ "name":"\n Alex\n ", "age": "18 \n ","car":" tesla \n \n" }
]
data = [{k:v.strip() for k,v in each.items()} for each in data]
df = pd.DataFrame(data)
Or you could just iterate through each column:
data = [
{ "name":"\n John\n ", "age": "30 \n ","car":" Bmw \n \n" },
{ "name":"\n Joe\n ", "age": "20 \n ","car":" mercedes \n \n" },
{ "name":"\n Alex\n ", "age": "18 \n ","car":" tesla \n \n" }
]
df = pd.DataFrame(data)
for col in df.columns:
df[col] = df[col].str.strip()
Output:
print(df)
name age car
0 John 30 Bmw
1 Joe 20 mercedes
2 Alex 18 tesla