Home > Net >  Pandas replacing string in one column leads to other column disappearing
Pandas replacing string in one column leads to other column disappearing

Time:04-19

New to Pandas, and Im doing something wrong. While running the bellow code to replace cells in column "data" that dont contain the string "fiels" with empty strings, instead of returning two columns (id, data), the whole of id column disappears with all rows starting with a delimiter instead. My intuition is because when I write back the chunk to csv I am only writing chunk_results which does not do anything on "id". The problem is I dont know how to solve it.

import pandas as pd
in_csv= "out.csv"
out_csv= "out_1.csv"
reader = pd.read_csv(in_csv, chunksize=100, sep='|', header=None, names=['id', 'data'], encoding='utf-8')
for chunk_df in reader:
    chunk_results = chunk_df['data'].astype(str).str.replace('^((?!field).)*$','', regex=True)
    chunk_results.to_csv(out_csv, mode='a', sep='|', encoding='utf-8', header=None, index=False)

What I have tried: I guessed that I needed to create a chunk_id = chunk_df['id'] and concat it with "chunk_results" to_csv but that just gave me an error. Any idea what Im doing wrong?

CodePudding user response:

You need to assign the results back to the dataframe chunk's column. When you assign to chunk_results you're setting it to a dataframe with just the data column

chunk_df['data'] = chunk_df['data'].astype(str).str.replace('^((?!field).)*$','', regex=True)
chunk_df.to_csv(out_csv, mode='a', sep='|', encoding='utf-8', header=None, index=False)
  • Related