I have created the following dataframe (called df
):
d = {'ltv': [1, 22,45,78], 'age': [33, 43,54,65],'job': ['Salaried','Salaried','Salaried','Owner'], 'UniqueID' : ['A1','A2','A3','A4'] }
df = pd.DataFrame(data=d)
which looks like this:
print(df)
ltv age job UniqueID
1 33 Salaried A1
22 43 Salaried A2
45 54 Salaried A3
78 65 Owner A4
I have checked its columns types:
print(df.info())
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ltv 4 non-null int64
1 age 4 non-null int64
2 job 4 non-null object
3 UniqueID 4 non-null object
I only focus on the two object columns which are job
and UniqueID
.
As you can see:
job
contains only stringsUniqueID
contains both strings and numbers
I want to be able to identify the column (in this case UniqueID
) that contains both strings and numbers.
If I use the following code for UniqueID
:
print(df['UniqueID'].str.isalnum())
0 True
1 True
2 True
3 True
I see that it returns True
for all records, which is great. Now, if I use the same code for job
, I get the same results:
print(df['job'].str.isalnum())
0 True
1 True
2 True
3 True
So, how can I identify in pandas which column that contains both strings and numbers (in this example: UniqueID
)?
CodePudding user response:
You can def your own function
def findchrandnum(x):
try :
return all(x.str.isalnum() & ~x.str.isalpha() & ~x.str.isdigit())
except:
return False
df.apply(findchrandnum)
Out[66]:
ltv False
age False
job False
UniqueID True
dtype: bool
CodePudding user response:
You can use the apply
method to the column you want to check, to look for digits for each row. The sum will give you the number of values that have a digit in that column:
col = 'UniqueID'
df[col].apply(
lambda val: any(ch.isdigit() for ch in val)
).sum()
If you know that your values in the columns are consistent, you can also check the first value only.