I have a dataframe of strings representing numbers (integers and floats).
I want to implement a validation to make sure the strings in certain columns only represent integers.
Here is a dataframe containing two columns, with header str as ints
and str as double
, representing integers and floats in string format.
# Import pandas library
import pandas as pd
# initialize list elements
data = ['10','20','30','40','50','60']
# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data, columns=['str as ints'])
df['str as double'] = ['10.0', '20.0', '30.0', '40.0', '50.0', '60.0']
Here is a function I wrote that checks for the radix in the string to determine whether it is an integer or float.
def includes_dot(s):
return '.' in s
I want to see if I can use the apply function on this dataframe, or do I need to write another function where I pass in the name of the dataframe and the list of column headers and then call includes_dot
like this:
def check_df(df, lst):
for val in lst:
apply(df[val]...?)
# then print out the results if certain columns fail the check
Or if there are better ways to do this problem altogether.
The expected output is a list of column headers that fails the criteria: if I have a list ['str as ints', 'str as double']
, then str as double
should be printed because that column does not contain all integers.
CodePudding user response:
for col in df:
if df[col].str.contains('\.').any():
print(col, "contains a '.'")