Home > Mobile >  Python df.loc with regex
Python df.loc with regex

Time:07-14

Dateframe that I am changing values of rows based on conditions.

Current Dataframe:

import pandas as pd
import re
data = [['ACK_ID','TEXT',30],
        ['TOT_ACTIVE_PARTCP_CNT','NUMERIC'],
        ['ADMIN_SIGNED_DATE', "TEXT", 30],
        ['BENEF_RCVG_BNFT_CNT','NUMERIC'],
        ['SPONS_SIGNED_DATE','TEXT',30]]
df = pd.DataFrame(data, columns=['FIELD_NAME', 'TYPE','SIZE (only for text fields)'])

#Change all "NUMERIC" to "FLOAT" in ['TYPE'] column.
df.loc[df["TYPE"] == "NUMERIC", "TYPE"] = "FLOAT"

I also want to change all ['TYPE'] rows that have 'DATE' within their ['FIELD_NAME'] entry. I want to use regex to capture 'DATE'.

Code attempt with regex:

df.loc[df["FIELD_NAME"] == r'^.*DATE $', "TYPE"] = "DATE"

This code does not change the dataframe at all.

The desired output is:

data = [['ACK_ID','TEXT',30],
        ['TOT_ACTIVE_PARTCP_CNT','FLOAT'],
        ['ADMIN_SIGNED_DATE', "DATE", 30],
        ['BENEF_RCVG_BNFT_CNT','FLOAT'],
        ['SPONS_SIGNED_DATE','DATE',30]]
df = pd.DataFrame(data, columns=['FIELD_NAME', 'TYPE','SIZE (only for text fields)'])

CodePudding user response:

You can use simple .str.contains:

df.loc[df["FIELD_NAME"].str.contains("DATE"), "TYPE"] = "DATE"
print(df)

Prints:

              FIELD_NAME   TYPE  SIZE (only for text fields)
0                 ACK_ID   TEXT                         30.0
1  TOT_ACTIVE_PARTCP_CNT  FLOAT                          NaN
2      ADMIN_SIGNED_DATE   DATE                         30.0
3    BENEF_RCVG_BNFT_CNT  FLOAT                          NaN
4      SPONS_SIGNED_DATE   DATE                         30.0

CodePudding user response:

you can use str.contains with a regex expression.

df.loc[df['FIELD_NAME'].str.contains(r'^.*DATE $'), 'TYPE'] = 'DATE'
print(df)

              FIELD_NAME   TYPE  SIZE (only for text fields)
0                 ACK_ID   TEXT                         30.0
1  TOT_ACTIVE_PARTCP_CNT  FLOAT                          NaN
2      ADMIN_SIGNED_DATE   DATE                         30.0
3    BENEF_RCVG_BNFT_CNT  FLOAT                          NaN
4      SPONS_SIGNED_DATE   DATE                         30.0

If Date is always at the end you could also just use str.endswith :

df.loc[df['FIELD_NAME'].str.endswith('DATE'), 'TYPE'] = 'DATE'
  • Related