Home > Software design >  Identify pandas dataframe columns containing both numeric and string
Identify pandas dataframe columns containing both numeric and string

Time:04-07

I have created the following dataframe (called df):

d = {'ltv': [1, 22,45,78], 'age': [33, 43,54,65],'job': ['Salaried','Salaried','Salaried','Owner'], 'UniqueID' : ['A1','A2','A3','A4'] }
df = pd.DataFrame(data=d)

which looks like this:

print(df)

   ltv  age       job UniqueID
     1   33  Salaried       A1
    22   43  Salaried       A2
    45   54  Salaried       A3
    78   65     Owner       A4

I have checked its columns types:

print(df.info())

 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   ltv       4 non-null      int64 
 1   age       4 non-null      int64 
 2   job       4 non-null      object
 3   UniqueID  4 non-null      object

I only focus on the two object columns which are job and UniqueID. As you can see:

  • job contains only strings
  • UniqueID contains both strings and numbers

I want to be able to identify the column (in this case UniqueID) that contains both strings and numbers.

If I use the following code for UniqueID:

print(df['UniqueID'].str.isalnum())

0    True
1    True
2    True
3    True

I see that it returns True for all records, which is great. Now, if I use the same code for job, I get the same results:

print(df['job'].str.isalnum())

    0    True
    1    True
    2    True
    3    True

So, how can I identify in pandas which column that contains both strings and numbers (in this example: UniqueID)?

CodePudding user response:

You can def your own function

def findchrandnum(x):
    try :
        return all(x.str.isalnum() & ~x.str.isalpha() & ~x.str.isdigit())
    except:
        return False
df.apply(findchrandnum)
Out[66]: 
ltv         False
age         False
job         False
UniqueID     True
dtype: bool

CodePudding user response:

You can use the apply method to the column you want to check, to look for digits for each row. The sum will give you the number of values that have a digit in that column:

col = 'UniqueID'
df[col].apply(
    lambda val: any(ch.isdigit() for ch in val)
).sum()

If you know that your values in the columns are consistent, you can also check the first value only.

  • Related