Converting different columns to a datetime-CodePudding

The time in my csv file is divided into 4 columns, (year, julian day, hour/minut(utc) and second), and I wanted to convert to a single column so that it looks like this: 14/11/2017 00:16:00.

Is there a easy way to do this?

A sample of the code is

cols = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
D14 = pd.read_csv(r'C:\Users\William Jacondino\Desktop\DadosTimeSeries\PIRATA-PROFILE\Dados FLUXO\Dados_brutos_copy-20220804T151840Z-002\Dados_brutos_copy\tm_data_2017_11_14_0016.dat', header=None, usecols=cols, names=["Year","Julian day", "Hour/minut (UTC)", "Second", "Bateria (V)", "PTemp (°C)", "Latitude", "Longitude", "Magnectic_Variation (arb)", "Altitude (m)", "Course (º)", "WS", "Nmbr_of_Satellites (arb)", "RAD", "Tar", "UR", "slp",], sep=',')
D14 = D14.loc[:, ["Year","Julian day", "Hour/minut (UTC)", "Second", "Latitude", "Longitude","WS", "RAD", "Tar", "UR", "slp"]]

My array looks like that:

CodePudding user response：

You can create a new column which also adds the data from other columns.

For example, if you have a dataframe like so:

df = pd.DataFrame(dict)

# Print df:

   year month  day  a  b  c
0  2010   jan    1  1  4  7
1  2010   feb    2  2  5  8
2  2020   mar    3  3  6  9

You can add a new column field on the DataFrame, with the values extracted from the Year Month and Date columns.

df['newColumn'] = df.year.astype(str) '-' df.month '-' df.day.astype(str)

Edit: In your situation instead of using df.month use df['Julian Day'] since the column name is different. To understand more on why this is, read here

The data in the new column will be as string with the way you like to format it. You can also substitute the dash '-' with a slash '/' or however you need to format the outcome. You just need to convert the integers into strings with .astype(str)

Output:

   year month  day  a  b  c   newColumn
0  2010   jan    1  1  4  7  2010-jan-1
1  2010   feb    2  2  5  8  2010-feb-2
2  2020   mar    3  3  6  9  2020-mar-3

After that you can do anything as you would on a dataframe object.

If you only need it for data analysis you can do it with the function .groupBy() which groups the data fields and performs the analysis. source

CodePudding user response：

If your dataframe looks like

import pandas as pd

df = pd.DataFrame({
    "year": [2017, 2017], "julian day": [318, 318], "hour/minut(utc)": [16, 16],
    "second": [0, 30],
})

   year  julian day  hour/minut(utc)  second
0  2017         318               16       0
1  2017         318               16      30

then you could use pd.to_datetime() and pd.to_timedelta() to do

df["datetime"] = (
    pd.to_datetime(df["year"].astype("str"), format="%Y")
      pd.to_timedelta(df["julian day"] - 1, unit="days")
      pd.to_timedelta(df["hour/minut(utc)"], unit="minutes")
      pd.to_timedelta(df["second"], unit="seconds")
).dt.strftime("%d/%m/%Y %H:%M:%S")

and get

   year  julian day  hour/minut(utc)  second             datetime
0  2017         318               16       0  14/11/2017 00:16:00
1  2017         318               16      30  14/11/2017 00:16:30

The column datetime now contains strings. Remove the .dt.strftime("%d/%m/%Y %H:%M:%S") part at the end, if you want datetimes instead.