I have a pandas dataframe with date values, however, I need to convert it from dates to text General format like in Excel, not to date string, in order to match with primary keys values in SQL, which are, unfortunately, reordered in general format. Is it possible to do it Python or the only way to convert this column to general format in Excel?

Here is how the dataframe's column looks like:

   ID         Desired Output
1/1/2022        44562
7/21/2024       45494
1/1/1931        11324

CodePudding user response：

Yes, it's possible. The general format in Excel starts counting the days from the date 1900-1-1.

You can calculate a time delta between the dates in ID and 1900-1-1.

Inspired by this post you could do...

import pandas as pd

from datetime import date

# create a data frame
data = pd.DataFrame({'ID': ['1/1/2022','7/21/2024','1/1/1931']})

# convert the strings in ID to a datetime, then into a series with squeeze and then to a date format. The date format is helpful when calculating time deltas.

sr = pd.to_datetime(data['ID'], format= '%m/%d/%Y').squeeze().dt.date

# Calculate the time deltas by subtracting 1900-1-1 from date in sr and store it in the General format column of data.

data['General format'] =  sr.apply(lambda x: (x - date(1900, 1, 1)).days  2 ).to_frame()

print(data)

          ID  General format
0   1/1/2022           44562
1  7/21/2024           45494
2   1/1/1931           11324

Here a bit less condensed...

import pandas as pd

from datetime import date

data = pd.DataFrame({'ID': ['1/1/2022','7/21/2024','1/1/1931']})

ID_to_datetime = pd.to_datetime(data['ID'], format= '%m/%d/%Y')

ID_to_datetime_to_series = ID_to_datetime.squeeze() 

ID_to_datetime_to_series_to_date = ID_to_datetime_to_series.dt.date 

General_format = []

for a_date in ID_to_datetime_to_series_to_date:
   
   timedelta = a_date - date(1900, 1, 1) 
   
   General_format.append(timedelta.days   2 )

data['General format'] =  General_format

print(data)

          ID  General format
0   1/1/2022           44562
1  7/21/2024           45494
2   1/1/1931           11324

The plus 2 tries to take care of the leap years. For the dates you provided 2 seems correct but you should verify this.

EDIT

Using pandas only as per suggestion by MrFuppes

data = pd.DataFrame({'ID': ['1/1/2022','7/21/2024','1/1/1931']})
data['General format'] =  (pd.to_datetime(data["ID"])-pd.Timestamp("1899-12-30")).dt.days
print(data)

I guess pandas is taking care of the leap years?

CodePudding user response：

Excel stores dates as sequential serial numbers so that they can be used in calculations. By default, January 1, 1900 is serial number 1, and January 1, 2008 is serial number 39448 because it is 39,447 days after January 1, 1900.
-Microsoft's documentation

So you can just calculate (difference between your date and January 1, 1900) 1

see How to calculate number of days between two given dates

CodePudding user response：

First, determine the datatype. Then, you will have something more to work with. You could use '.astype()' to change the type of the data, an iterator to remove the '/' marks, or other methods to change it.