Home > Back-end >  Treat everything as raw string (even formulas) when reading into pandas from excel
Treat everything as raw string (even formulas) when reading into pandas from excel

Time:02-02

So, I am actually handling text responses from surveys, and it is common to have responses that starts with -, an example is: -I am sad today.

Excel would interpret it as #NAMES? So when I import the excel file into pandas using read_excel, it would show NAN.

Now is there any method to force excel to retain as raw strings instead interpret it at formula level?

I created a vba and assigning the entire column with text to click through all the cells in the column, which is slow if there is ten thousand data.

I was hoping it can do it at python level instead, any idea?

CodePudding user response:

I hope, it works for your solution, use openpyxl to extract excel data and then convert it into a pandas dataframe

from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename = './formula_contains_raw.xlsx', ).active
print(wb.values)
# sheet_names = wb.get_sheet_names()[0]
# sheet_ranges = wb[name]
df = pd.DataFrame(list(wb.values)[1:], columns=list(wb.values)[0])
df.head()

CodePudding user response:

It works for me using a CSV instead of excel file. In the CSV file (opened in excel) I need to select the option Formulas/Show Formulas, then save the file.

pd.read_csv('draft.csv')

Output:

        Col1
0       hello
1       =-hello
  • Related