I have a pandas dataframe with date values, however, I need to convert it from dates to text General format like in Excel, not to date string, in order to match with primary keys values in SQL, which are, unfortunately, reordered in general format. Is it possible to do it Python or the only way to convert this column to general format in Excel?
Here is how the dataframe's column looks like:
ID Desired Output
1/1/2022 44562
7/21/2024 45494
1/1/1931 11324
CodePudding user response:
Yes, it's possible. The general format in Excel starts counting the days from the date 1900-1-1.
You can calculate a time delta between the dates in ID and 1900-1-1.
Inspired by this post you could do...
import pandas as pd
from datetime import date
# create a data frame
data = pd.DataFrame({'ID': ['1/1/2022','7/21/2024','1/1/1931']})
# convert the strings in ID to a datetime, then into a series with squeeze and then to a date format. The date format is helpful when calculating time deltas.
sr = pd.to_datetime(data['ID'], format= '%m/%d/%Y').squeeze().dt.date
# Calculate the time deltas by subtracting 1900-1-1 from date in sr and store it in the General format column of data.
data['General format'] = sr.apply(lambda x: (x - date(1900, 1, 1)).days 2 ).to_frame()
print(data)
ID General format
0 1/1/2022 44562
1 7/21/2024 45494
2 1/1/1931 11324
Here a bit less condensed...
import pandas as pd
from datetime import date
data = pd.DataFrame({'ID': ['1/1/2022','7/21/2024','1/1/1931']})
ID_to_datetime = pd.to_datetime(data['ID'], format= '%m/%d/%Y')
ID_to_datetime_to_series = ID_to_datetime.squeeze()
ID_to_datetime_to_series_to_date = ID_to_datetime_to_series.dt.date
General_format = []
for a_date in ID_to_datetime_to_series_to_date:
timedelta = a_date - date(1900, 1, 1)
General_format.append(timedelta.days 2 )
data['General format'] = General_format
print(data)
ID General format
0 1/1/2022 44562
1 7/21/2024 45494
2 1/1/1931 11324
The plus 2 tries to take care of the leap years. For the dates you provided 2 seems correct but you should verify this.
EDIT
Using pandas only as per suggestion by MrFuppes
data = pd.DataFrame({'ID': ['1/1/2022','7/21/2024','1/1/1931']})
data['General format'] = (pd.to_datetime(data["ID"])-pd.Timestamp("1899-12-30")).dt.days
print(data)
I guess pandas is taking care of the leap years?
CodePudding user response:
Excel stores dates as sequential serial numbers so that they can be used in calculations. By default, January 1, 1900 is serial number 1, and January 1, 2008 is serial number 39448 because it is 39,447 days after January 1, 1900.
-Microsoft's documentation
So you can just calculate (difference between your date and January 1, 1900) 1
see How to calculate number of days between two given dates
CodePudding user response:
First, determine the datatype. Then, you will have something more to work with. You could use '.astype()' to change the type of the data, an iterator to remove the '/' marks, or other methods to change it.