Home > Blockchain >  Edit row of DataFrame IF row contains specific string
Edit row of DataFrame IF row contains specific string

Time:08-26

I have multiple dataframes stored in a dictionary.

Each dataframe has 3 columns as shown below

exceldata_1['Sheet1']

                      0                     1       2

0   Sv2.55 Fv2.04R02[2022-01-01T00         16      29.464Z]
1   - SC                                   OK       NaN
2   - PC1 Number                            1       NaN
3   - PC1 Main Status                      OK       NaN
4   - PC1 PV                       4294954868       NaN
... ... ... ...
1046    - C Temperature                  17�C        NaN
1047    Sv2.55 Fv2.04R02[2022-01-01T23     16       30.782Z]
1048    - Level SS             High      NaN
1049    Sv2.55 Fv2.04R02[2022-01-01T23  16  34.235Z]
1050    Sv2.55 Fv2.04R02[2022-01-01T23  16  38.657Z]
1051 rows × 3 columns

I want to do this : Search each row of dataframe if it contains "Sv2." and change that row as follows Remove "Sv2.55 Fv2.04R02[" this part and combine the remaining data to form the date and time correctly in each column....Showing the desired outpu below...The last column can be deleted as it will not contain any data after performing this operation.

           0                    1                 2

0     2022-01-01               00:16:29          NaN
1   - SC                          OK             NaN
2   - PC1 Number                  1              NaN
3   - PC1 Main Status            OK              NaN
4   - PC1 PV                   4294954868        NaN
... ... ... ...
1046    - C Temperature           17�C           NaN
1047    2022-01-01             23:16:30          NaN
1048    - Level SS               High            NaN
1049    2022-01-01             23:16:34          NaN
1050    2022-01-01             23:16:38          NaN
1051 rows × 3 columns

How can I achieve this?

CodePudding user response:

Using regular expressions should work

for i in range(len(df)):
    text=df['0'][i]
    if re.search('Sv',text)!=None:
        item_list=re.split('\[|T|\s\s|Z',text[:-1])
        df.iloc[i,0]=item_list[1]
        df.iloc[i,1]=item_list[2] ':' item_list[3] ':' item_list[4]

CodePudding user response:

With df one of your dataframes you could try the following:

m = df[0].str.contains("Sv2.")
ser = df.loc[m, 0]   " "   df.loc[m, 1]   " "   df.loc[m, 2]
datetime = pd.to_datetime(
    ser.str.extract(r"Sv2\..*?\[(.*?)\]")[0].str.replace(r"\s ", " ", regex=True),
    format="%Y-%m-%dT%H %M %S.%fZ"
)
df.loc[m, 0] = datetime.dt.strftime("%Y-%m-%d")
df.loc[m, 1] = datetime.dt.strftime("%H:%M:%S")
df.loc[m, 2] = np.NaN
  • First build a mask m that selects the rows that contain a "Sv2." in the first column.
  • Based on that build a series ser with the relevant strings, added together with a blank inbetween.
  • Use .str.extract to fetch the datetime-part via the capture group of a regex: Look for the "Sv2."-part, then go forward until the opening bracket "[", and then catch all until the closing bracket "]".
  • Convert those strings with pd.to_datetime to datetimes (see here for the format codes).
  • Extract the required parts with .dt.strftime into the resp. columns.

Alternative approach without real datetimes:

m = df[0].str.contains("Sv2.")
ser = df.loc[m, 0]   " "   df.loc[m, 1]   " "   df.loc[m, 2]
datetime = ser.str.extract(
    r"Sv2\..*?\[(\d{4}-\d{2}-\d{2}).*?(\d{2}\s \d{2}\s \d{2})\."
)
datetime[1] = datetime[1].str.replace(r"\s ", ":", regex=True)
df.loc[m, [0, 1]] = datetime
df.loc[m, 2] = np.NaN

Result for the following sample df (taken from your example)

                                0     1         2
0  Sv2.55 Fv2.04R02[2022-01-01T00    16  29.464Z]
1                            - SC    Ok       NaN
2                     - PC Number     1       NaN
3                         - PC MS    Ok       NaN
4                     - PC PValue     8       NaN
5                      - Level SS  High       NaN
6  Sv2.55 Fv2.04R02[2022-01-01T23    16  34.235Z]
7  Sv2.55 Fv2.04R02[2022-01-01T23    16  38.657Z]

is

             0         1   2
0   2022-01-01  00:16:29 NaN
1         - SC        Ok NaN
2  - PC Number         1 NaN
3      - PC MS        Ok NaN
4  - PC PValue         8 NaN
5   - Level SS      High NaN
6   2022-01-01  23:16:34 NaN
7   2022-01-01  23:16:38 NaN

CodePudding user response:

Thanks for the idea on how to proceed @Irsyaduddin ..With some modifications to his answer, I was able to achieve it.

  • Make sure all the data types in your dataframe are strings

    import re
    for i in range(len(df1)):
        text= (df1[0][i]) df1[1][i] (df1[2][i]) #combining data from all cols
        if re.search('Sv',text)!=None:
            item_list=re.split('\[|T|Z',text)
            df1.iloc[i,0]=item_list[1]
            df1.iloc[i,1]=item_list[2][:2] ":" item_list[2] 
                            [2:4] ":" item_list[2][4:6]
            df1.iloc[i,2]='NaN'
      df1
    

Result:

                0                 1            2
    0   2022-01-01             00:16:29        NaN
    1   - Server Connection       OK           nan
    2   - PC1 Number               1           nan
    3   - PC1 MS                  OK           nan
    4   - PC1 PV               4294954868      nan
    ... ... ... ...
    1046    - C Temperature       17�C         nan
    1047    2022-01-01             23:16:30    NaN
    1048    - Level Sensor Status   High       nan
    1049    2022-01-01             23:16:34    NaN
    1050    2022-01-01             23:16:38    NaN
    1051 rows × 3 columns

Result of Split:

item_list
['Sv2.55 Fv2.04R02', '2022-01-01', '001629.464', '] ']
  • Related