Append list to dictionary-CodePudding

I am working on a code like below, which slices the address column. For this I have created a dictionary and created an empty list final to append all the pre processing.see code

import pandas as pd

data = {'id':  ['001', '002', '003'],
        'address': ["William J. Clare\\n290 Valley Dr.\\nCasper, WY 82604\\nUSA",
                    "1180 Shelard Tower\\nMinneapolis, MN 55426\\nUSA",
                    "William N. Barnard\\n145 S. Durbin\\nCasper, WY 82601\\nUSA"]

df_dict = df.to_dict('records')

final = []
for row in df_dict:
    add = row["address"]
    # print(add.split("\\n") , len(add.split("\\n")))
    if len(add.split("\\n")) > 3:
        target = add.split("\\n")
        target = target[-3:]
        target = '\\n'.join(target)
    else:
        target = add.split("\\n")
        target = '\\n'.join(target)
    final.append(target)
    print(target)

After preprocessing I am appending the empty list. Now, I want to update the df_dict with the final list. and convert the df_dict to pandas dataframe.

sample out put:

id  address
1   290 Valley Dr.\\nCasper, WY 82604\\nUSA
2   1180 Shelard Tower\\nMinneapolis, MN 55426\\nUSA
3   145 S. Durbin\\nCasper, WY 82601\\nUSA

Your help will be greatly appreciated.

Thanks in advance

CodePudding user response：

If you want to assing to dictionary

df_dict['address'] = final

But you can assign directly to dataframe

df['address'] = final

You could even use .apply(function) to do it without creating dictionary

def preprocess(value):
    parts = value.split("\\n") 
    if len(parts) > 3:
        parts = parts[-3:]
    value = '\\n'.join(parts)
    print(value)
    return value
    
df['address'] = df['address'].apply(preprocess)

Full working code:

import pandas as pd

data = {'id':  ['001', '002', '003'],
        'address': ["William J. Clare\\n290 Valley Dr.\\nCasper, WY 82604\\nUSA",
                    "1180 Shelard Tower\\nMinneapolis, MN 55426\\nUSA",
                    "William N. Barnard\\n145 S. Durbin\\nCasper, WY 82601\\nUSA"]}

df = pd.DataFrame(data)
print(df)

def preprocess(value):
    parts = value.split("\\n") 
    if len(parts) > 3:
        parts = parts[-3:]
    value = '\\n'.join(parts)
    print(value)
    return value
    
df['address'] = df['address'].apply(preprocess)

print(df)