I have written this function:
def time_to_unix(df,dateToday):
'''this function creates the timestamp column for the dataframe. it also gets today's date (ex: 2022-8-8 0:0:0)
and then it adds the seconds that were originally in the timestamp column.
input: dataframe, dateToday(type: pandas.core.series.Series)
output: list of times
'''
dateTime = dateToday[0]
times = []
for i in range(0,len(df['timestamp'])):
dateAndTime = dateTime timedelta(seconds = float(df['timestamp'][i]))
unix = pd.to_datetime([dateAndTime]).astype(int) / 10**9
times.append(unix[0])
return times
so it takes a dataframe and it gets today's date and then its taking the value of the timestamp in the dataframe( which is in seconds like 10,20,.... ) then it applies the function and returns times in unix time
however, because I have approx 2million row in my dataframe, its taking me a lot of time to run this code.
how can I use lambda function or something else in order to speed up my code and the process.
something along the line of:
df['unix'] = df.apply(lambda row : something in here), axis = 1)
CodePudding user response:
What I think you'll find is that most of the time is spent in the creation and manipulation of the datetime / timestamp objects in the dataframe (see here for more info). I also try to avoid using lambdas like this on large dataframes as they go row by row which should be avoided. What I've done when dealing with datetimes / timestamps / timezone changes in the past is to build a dictionary of the possible datetime combinations and then use map to apply them. Something like this:
import datetime as dt
import pandas as pd
#Make a time key column out of your date and timestamp fields
df['time_key'] = df['date'].astype(str) '@' df['timestamp']
#Build a dictionary from the unique time keys in the dataframe
time_dict = dict()
for time_key in df['time_key'].unique():
time_split = time_key.split('@')
#Create the Unix time stamp based on the values in the key; store it in the dictionary so it can be mapped later
time_dict[time_key] = (pd.to_datetime(time_split[0]) dt.timedelta(seconds=float(time_split[1]))).astype(int) / 10**9
#Now map the time_key to the unix column in the dataframe from the dictionary
df['unix'] = df['time_key'].map(time_dict)
Note if all the datetime combinations are unique in the dataframe, this likely won't help.
CodePudding user response:
I'm not exactly sure what type dateTime[0]
has. But you could try a more vectorized approach:
import pandas as pd
df["unix"] = (
(pd.Timestamp(dateTime[0]) pd.to_timedelta(df["timestamp"], unit="seconds"))
.astype("int").div(10**9)
)
or
df["unix"] = (
(dateTime[0] pd.to_timedelta(df["timestamp"], unit="seconds"))
.astype("int").div(10**9)
)