I have a table like this: enter image description here
I want only date column and units column (column 1 and 5), but with date in another format. I used code like this:
`import pandas as pd
customer_calls = pd.read_excel("sales.xlsx", usecols=[0, 4])
customer_calls["OrderDate"] = pd.to_datetime(customer_calls["OrderDate"]).dt.strftime("%Y%m%d" "00")
customer_calls.to_excel("sales_YYYYMMDD.xlsx")
print(customer_calls)`
It gives me what I wanted: enter image description here
I need it without header and index. But when I use header=0 or header=None, then can not read line:
`customer_calls["OrderDate"] = pd.to_datetime(customer_calls["OrderDate"]).dt.strftime("%Y%m%d" "00")`
cause there is no "Orderdate" name of column anymore. I tried to use 0 instead of name and all kind of stuff, but it always says error. How can I remove header and index but still choose date column after that?
I've read dozens of examples here, nothing solved this. Or I can no see it.
CodePudding user response:
If you want to remove the headers and index, then essentially you are seeking only the values
. If so, you extract the values and use the tolist()
method.
Here is an example of this:
import pandas as pd
# example dataframe
df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['A', 'B', 'C'])
# extract values only
data = df.values.tolist()
print(data)
Here is the result of the above:
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
The values are now just a list of lists.
CodePudding user response:
I've done it! Posting it for the future similar questions. It can be done really easily in panda, just two more lines.
import pandas as pd
# Read the file and specify which column is the date
customer_calls = pd.read_excel("sales.xlsx", usecols=[0, 1])
# Output with dates converted to YYYY-MM-DD
customer_calls["OrderDate"] = pd.to_datetime(customer_calls["OrderDate"]).dt.strftime("%Y%m%d" "00")
customer_calls.to_excel("sales_YYYYMMDD.xlsx")
#set the location of the first row with columns
customer_calls.columns = customer_calls.iloc[0]
#remove first row from the dataframe rows
customer_calls = customer_calls[1:]
#display
print(customer_calls)
it gives output like this:
0 2020010600 East
1 2020020900 Central
2 2020031500 West
3 2020040100 East
4 2020050500 Central
5 2020060800 East
6 2020071200 East
7 2020081500 East
8 2020090100 Central
9 2020100500 Central
10 2020110800 East
11 2020121200 Central
changed data format and without header