I have a csv with a column of mixed data types, as shown by my dummy table below. Using python and pandas, I am able to add leading zeros to my csv using:
df = pd.read_csv('test.csv')
df['column_A'] = df['column_A'].str.zfill(10)
I am trying to add leading zeros to the column but only want it to be added to data that are integers specific, i.e. row 3. The issue is that it is reading row 5 as integers as well, despite there being a '-' which I thought would make it a str. How do I specify to only add leading zeros to only integers cells with less than 10 digits?
Below is an example of the data present in the column and what outcomes I am aiming for.
| Raw Data | Ideal Outcome| Actual Outcome|
------------------------------------------------
| ABC-1234 | ABC-1234 | ABC-1234 |
| 000-1234-000 | 000-1234-000 | 000-1234-000 |
| 1234 | 0000001234 | 0000001234 |
| 1234567890123 | 1234567890123| 1234567890123 |
| 00-18 | 00-18 | 00000000-18 |
Table 1
CodePudding user response:
You can use DataFrame.applymap to apply a custom function to each cell of the DataFrame. Try something like:
def my_func(value):
str_value = str(value)
if str_value.isnumeric():
return str_value.zfill(10)
else:
return value
df = df.applymap(my_func)
CodePudding user response:
to update the Raw-Data column using loc
# check if the value in raw-data is numeric, when true append zeros to it
df.loc[df['Raw Data'].str.isnumeric(),'Raw Data' ] = df['Raw Data'].astype(str).str.zfill(10)
df
Raw Data Ideal Outcome Actual Outcome
0 ABC-1234 ABC-1234 ABC-1234
1 000-1234-000 000-1234-000 000-1234-000
2 0000001234 0000001234 0000001234
3 1234567890123 1234567890123 1234567890123
4 00-18 00-18 00000000-18
OR
using mask
#creating a new column 'outcome' when raw-data value isnumeric, with zero prepended to it
df['outcome']=df['Raw Data'].mask(df['Raw Data'].str.isnumeric() , df['Raw Data'].astype(str).str.zfill(10))
df
Raw Data Ideal Outcome Actual Outcome outcome
0 ABC-1234 ABC-1234 ABC-1234 ABC-1234
1 000-1234-000 000-1234-000 000-1234-000 000-1234-000
2 0000001234 0000001234 0000001234 0000001234
3 1234567890123 1234567890123 1234567890123 1234567890123
4 00-18 00-18 00000000-18 00-18