python, using columns without header and index-CodePudding

I have a table like this: enter image description here

I want only date column and units column (column 1 and 5), but with date in another format. I used code like this:

`import pandas as pd

customer_calls = pd.read_excel("sales.xlsx", usecols=[0, 4])



customer_calls["OrderDate"] = pd.to_datetime(customer_calls["OrderDate"]).dt.strftime("%Y%m%d"   "00")
customer_calls.to_excel("sales_YYYYMMDD.xlsx")

print(customer_calls)`

It gives me what I wanted: enter image description here

I need it without header and index. But when I use header=0 or header=None, then can not read line:

`customer_calls["OrderDate"] = pd.to_datetime(customer_calls["OrderDate"]).dt.strftime("%Y%m%d"   "00")`

cause there is no "Orderdate" name of column anymore. I tried to use 0 instead of name and all kind of stuff, but it always says error. How can I remove header and index but still choose date column after that?

I've read dozens of examples here, nothing solved this. Or I can no see it.

CodePudding user response：

If you want to remove the headers and index, then essentially you are seeking only the values. If so, you extract the values and use the tolist() method.

Here is an example of this:

import pandas as pd

# example dataframe
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['A', 'B', 'C'])

# extract values only
data = df.values.tolist()

print(data)

Here is the result of the above:

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

The values are now just a list of lists.

CodePudding user response：

I've done it! Posting it for the future similar questions. It can be done really easily in panda, just two more lines.

import pandas as pd


# Read the file and specify which column is the date
customer_calls = pd.read_excel("sales.xlsx", usecols=[0, 1])


# Output with dates converted to YYYY-MM-DD
customer_calls["OrderDate"] = pd.to_datetime(customer_calls["OrderDate"]).dt.strftime("%Y%m%d"   "00")
customer_calls.to_excel("sales_YYYYMMDD.xlsx")


#set the location of the first row with columns
customer_calls.columns = customer_calls.iloc[0] 
#remove first row from the dataframe rows
customer_calls = customer_calls[1:]
#display
print(customer_calls)

it gives output like this:

0   2020010600     East
1   2020020900  Central
2   2020031500     West
3   2020040100     East
4   2020050500  Central
5   2020060800     East
6   2020071200     East
7   2020081500     East
8   2020090100  Central
9   2020100500  Central
10  2020110800     East
11  2020121200  Central

changed data format and without header