I have multiple dataframes stored in a dictionary.
Each dataframe has 3 columns as shown below
exceldata_1['Sheet1']
0 1 2
0 Sv2.55 Fv2.04R02[2022-01-01T00 16 29.464Z]
1 - SC OK NaN
2 - PC1 Number 1 NaN
3 - PC1 Main Status OK NaN
4 - PC1 PV 4294954868 NaN
... ... ... ...
1046 - C Temperature 17�C NaN
1047 Sv2.55 Fv2.04R02[2022-01-01T23 16 30.782Z]
1048 - Level SS High NaN
1049 Sv2.55 Fv2.04R02[2022-01-01T23 16 34.235Z]
1050 Sv2.55 Fv2.04R02[2022-01-01T23 16 38.657Z]
1051 rows × 3 columns
I want to do this : Search each row of dataframe if it contains "Sv2." and change that row as follows Remove "Sv2.55 Fv2.04R02[" this part and combine the remaining data to form the date and time correctly in each column....Showing the desired outpu below...The last column can be deleted as it will not contain any data after performing this operation.
0 1 2
0 2022-01-01 00:16:29 NaN
1 - SC OK NaN
2 - PC1 Number 1 NaN
3 - PC1 Main Status OK NaN
4 - PC1 PV 4294954868 NaN
... ... ... ...
1046 - C Temperature 17�C NaN
1047 2022-01-01 23:16:30 NaN
1048 - Level SS High NaN
1049 2022-01-01 23:16:34 NaN
1050 2022-01-01 23:16:38 NaN
1051 rows × 3 columns
How can I achieve this?
CodePudding user response:
Using regular expressions should work
for i in range(len(df)):
text=df['0'][i]
if re.search('Sv',text)!=None:
item_list=re.split('\[|T|\s\s|Z',text[:-1])
df.iloc[i,0]=item_list[1]
df.iloc[i,1]=item_list[2] ':' item_list[3] ':' item_list[4]
CodePudding user response:
With df
one of your dataframes you could try the following:
m = df[0].str.contains("Sv2.")
ser = df.loc[m, 0] " " df.loc[m, 1] " " df.loc[m, 2]
datetime = pd.to_datetime(
ser.str.extract(r"Sv2\..*?\[(.*?)\]")[0].str.replace(r"\s ", " ", regex=True),
format="%Y-%m-%dT%H %M %S.%fZ"
)
df.loc[m, 0] = datetime.dt.strftime("%Y-%m-%d")
df.loc[m, 1] = datetime.dt.strftime("%H:%M:%S")
df.loc[m, 2] = np.NaN
- First build a mask
m
that selects the rows that contain a"Sv2."
in the first column. - Based on that build a series
ser
with the relevant strings, added together with a blank inbetween. - Use
.str.extract
to fetch the datetime-part via the capture group of a regex: Look for the"Sv2."
-part, then go forward until the opening bracket"["
, and then catch all until the closing bracket"]"
. - Convert those strings with
pd.to_datetime
to datetimes (see here for the format codes). - Extract the required parts with
.dt.strftime
into the resp. columns.
Alternative approach without real datetimes:
m = df[0].str.contains("Sv2.")
ser = df.loc[m, 0] " " df.loc[m, 1] " " df.loc[m, 2]
datetime = ser.str.extract(
r"Sv2\..*?\[(\d{4}-\d{2}-\d{2}).*?(\d{2}\s \d{2}\s \d{2})\."
)
datetime[1] = datetime[1].str.replace(r"\s ", ":", regex=True)
df.loc[m, [0, 1]] = datetime
df.loc[m, 2] = np.NaN
Result for the following sample df
(taken from your example)
0 1 2
0 Sv2.55 Fv2.04R02[2022-01-01T00 16 29.464Z]
1 - SC Ok NaN
2 - PC Number 1 NaN
3 - PC MS Ok NaN
4 - PC PValue 8 NaN
5 - Level SS High NaN
6 Sv2.55 Fv2.04R02[2022-01-01T23 16 34.235Z]
7 Sv2.55 Fv2.04R02[2022-01-01T23 16 38.657Z]
is
0 1 2
0 2022-01-01 00:16:29 NaN
1 - SC Ok NaN
2 - PC Number 1 NaN
3 - PC MS Ok NaN
4 - PC PValue 8 NaN
5 - Level SS High NaN
6 2022-01-01 23:16:34 NaN
7 2022-01-01 23:16:38 NaN
CodePudding user response:
Thanks for the idea on how to proceed @Irsyaduddin ..With some modifications to his answer, I was able to achieve it.
Make sure all the data types in your dataframe are strings
import re for i in range(len(df1)): text= (df1[0][i]) df1[1][i] (df1[2][i]) #combining data from all cols if re.search('Sv',text)!=None: item_list=re.split('\[|T|Z',text) df1.iloc[i,0]=item_list[1] df1.iloc[i,1]=item_list[2][:2] ":" item_list[2] [2:4] ":" item_list[2][4:6] df1.iloc[i,2]='NaN' df1
Result:
0 1 2
0 2022-01-01 00:16:29 NaN
1 - Server Connection OK nan
2 - PC1 Number 1 nan
3 - PC1 MS OK nan
4 - PC1 PV 4294954868 nan
... ... ... ...
1046 - C Temperature 17�C nan
1047 2022-01-01 23:16:30 NaN
1048 - Level Sensor Status High nan
1049 2022-01-01 23:16:34 NaN
1050 2022-01-01 23:16:38 NaN
1051 rows × 3 columns
Result of Split:
item_list
['Sv2.55 Fv2.04R02', '2022-01-01', '001629.464', '] ']