Home > Software engineering >  How to remove newline characters from string in pandas python
How to remove newline characters from string in pandas python

Time:11-12

For some reason, none of the solutions previously posted about this seem to answer my question.

I am reading in an excel page with 150 sheets. I am looping through them and preparing the data to be concatenated together. (doing things like deleting unneeded/blank columns, and transforming some data) However, for some reason, I cannot get rid of any of the newline characters, no matter what I try. Here are some variations that I've tried so you can see what DIDN'T work.

import pandas as pd
import os

os.chdir(r'C:\Users\agray\Downloads')

sheets_dict = pd.read_excel('2022_Advanced_Control.xlsx', sheet_name=None)
df_list = list(sheets_dict.values())
df_list_clean = []

The top part stays the same, this loop portion is what changes.

for df in df_list:
    df.columns = [c.replace(' ', '_') for c in df.columns]

    df.drop(df.columns.difference(['Prescription_Drug_Name','Drug_Tier', 'Drug_Notes']), 1, inplace=True)
    df.drop(df.tail(3).index, inplace=True)
    df.loc[:, 'Prescription_Drug_Name'] = df.loc[:, 'Prescription_Drug_Name'].replace("\n", "", inplace=True)
    df_list_clean.append(df)

This gives me a column that has nothing but blank values.

Here's another way I tried

for df in df_list:
    df.columns = [c.replace(' ', '_') for c in df.columns]

    df.drop(df.columns.difference(['Prescription_Drug_Name','Drug_Tier', 'Drug_Notes']), 1, inplace=True)
    df.drop(df.tail(3).index, inplace=True)
    df['Prescription_Drug_Name'] = df['Prescription_Drug_Name'].replace(r'\n','', regex=True, inplace=True)
    df_list_clean.append(df)

This version is only applying to a copy, so none of the changes it says it's making are actually being made to my df. Any ideas how to get rid of all these "/n" characters in my column? Thanks!

CodePudding user response:

Use str.strip():

df['Prescription_Drug_Name'] = df['Prescription_Drug_Name'].str.replace(r'\n', '')

CodePudding user response:

I always advise against inplace=True. Make an explicit copy where you mean to. This version is only applying to a copy, so none of the changes... being made to my df. Why don't you clone your data like this:

for df in df_list:
    clean = df.copy()
    clean.columns = [c.replace(' ', '_') for c in df.columns]

    clean = clean.drop(df.columns.difference(['Prescription_Drug_Name','Drug_Tier', 'Drug_Notes']), 1)

    # drop last three rows
    clean = clean.iloc[:-3]

    # modify column, remove `inplace` here
    clean['Prescription_Drug_Name'] = clean['Prescription_Drug_Name'].replace(r'\n','', regex=True)
    df_list_clean.append(clean)

That being said, all of the above can be chained, so you can do something like this:

for df in df_list
    clean = (df.rename(columns=lambda x: x.replace(' ', '_'))
               .reindex(['Prescription_Drug_Name','Drug_Tier', 'Drug_Notes'], axis=1).dropna(axis=0, how='all')   # select only the columns
               .iloc[:-3]
               .assign(Prescription_Drug_Name=lambda x: x.replace(r'\n', '', regex=True)
             )
    df_list_clean.append(clean)
  • Related