For some reason, none of the solutions previously posted about this seem to answer my question.
I am reading in an excel page with 150 sheets. I am looping through them and preparing the data to be concatenated together. (doing things like deleting unneeded/blank columns, and transforming some data) However, for some reason, I cannot get rid of any of the newline characters, no matter what I try. Here are some variations that I've tried so you can see what DIDN'T work.
import pandas as pd
import os
os.chdir(r'C:\Users\agray\Downloads')
sheets_dict = pd.read_excel('2022_Advanced_Control.xlsx', sheet_name=None)
df_list = list(sheets_dict.values())
df_list_clean = []
The top part stays the same, this loop portion is what changes.
for df in df_list:
df.columns = [c.replace(' ', '_') for c in df.columns]
df.drop(df.columns.difference(['Prescription_Drug_Name','Drug_Tier', 'Drug_Notes']), 1, inplace=True)
df.drop(df.tail(3).index, inplace=True)
df.loc[:, 'Prescription_Drug_Name'] = df.loc[:, 'Prescription_Drug_Name'].replace("\n", "", inplace=True)
df_list_clean.append(df)
This gives me a column that has nothing but blank values.
Here's another way I tried
for df in df_list:
df.columns = [c.replace(' ', '_') for c in df.columns]
df.drop(df.columns.difference(['Prescription_Drug_Name','Drug_Tier', 'Drug_Notes']), 1, inplace=True)
df.drop(df.tail(3).index, inplace=True)
df['Prescription_Drug_Name'] = df['Prescription_Drug_Name'].replace(r'\n','', regex=True, inplace=True)
df_list_clean.append(df)
This version is only applying to a copy, so none of the changes it says it's making are actually being made to my df. Any ideas how to get rid of all these "/n" characters in my column? Thanks!
CodePudding user response:
Use str.strip()
:
df['Prescription_Drug_Name'] = df['Prescription_Drug_Name'].str.replace(r'\n', '')
CodePudding user response:
I always advise against inplace=True
. Make an explicit copy where you mean to.
This version is only applying to a copy, so none of the changes... being made to my df. Why don't you clone your data like this:
for df in df_list:
clean = df.copy()
clean.columns = [c.replace(' ', '_') for c in df.columns]
clean = clean.drop(df.columns.difference(['Prescription_Drug_Name','Drug_Tier', 'Drug_Notes']), 1)
# drop last three rows
clean = clean.iloc[:-3]
# modify column, remove `inplace` here
clean['Prescription_Drug_Name'] = clean['Prescription_Drug_Name'].replace(r'\n','', regex=True)
df_list_clean.append(clean)
That being said, all of the above can be chained, so you can do something like this:
for df in df_list
clean = (df.rename(columns=lambda x: x.replace(' ', '_'))
.reindex(['Prescription_Drug_Name','Drug_Tier', 'Drug_Notes'], axis=1).dropna(axis=0, how='all') # select only the columns
.iloc[:-3]
.assign(Prescription_Drug_Name=lambda x: x.replace(r'\n', '', regex=True)
)
df_list_clean.append(clean)