I am working on a code like below, which slices the address column. For this I have created a dictionary and created an empty list final
to append all the pre processing.see code
import pandas as pd
data = {'id': ['001', '002', '003'],
'address': ["William J. Clare\\n290 Valley Dr.\\nCasper, WY 82604\\nUSA",
"1180 Shelard Tower\\nMinneapolis, MN 55426\\nUSA",
"William N. Barnard\\n145 S. Durbin\\nCasper, WY 82601\\nUSA"]
df_dict = df.to_dict('records')
final = []
for row in df_dict:
add = row["address"]
# print(add.split("\\n") , len(add.split("\\n")))
if len(add.split("\\n")) > 3:
target = add.split("\\n")
target = target[-3:]
target = '\\n'.join(target)
else:
target = add.split("\\n")
target = '\\n'.join(target)
final.append(target)
print(target)
After preprocessing I am appending the empty list. Now, I want to update the df_dict
with the final
list. and convert the df_dict
to pandas dataframe.
sample out put:
id address
1 290 Valley Dr.\\nCasper, WY 82604\\nUSA
2 1180 Shelard Tower\\nMinneapolis, MN 55426\\nUSA
3 145 S. Durbin\\nCasper, WY 82601\\nUSA
Your help will be greatly appreciated.
Thanks in advance
CodePudding user response:
If you want to assing to dictionary
df_dict['address'] = final
But you can assign directly to dataframe
df['address'] = final
You could even use .apply(function)
to do it without creating dictionary
def preprocess(value):
parts = value.split("\\n")
if len(parts) > 3:
parts = parts[-3:]
value = '\\n'.join(parts)
print(value)
return value
df['address'] = df['address'].apply(preprocess)
Full working code:
import pandas as pd
data = {'id': ['001', '002', '003'],
'address': ["William J. Clare\\n290 Valley Dr.\\nCasper, WY 82604\\nUSA",
"1180 Shelard Tower\\nMinneapolis, MN 55426\\nUSA",
"William N. Barnard\\n145 S. Durbin\\nCasper, WY 82601\\nUSA"]}
df = pd.DataFrame(data)
print(df)
def preprocess(value):
parts = value.split("\\n")
if len(parts) > 3:
parts = parts[-3:]
value = '\\n'.join(parts)
print(value)
return value
df['address'] = df['address'].apply(preprocess)
print(df)