Home > Blockchain >  Adding leading zeros to int only in mixed data columns
Adding leading zeros to int only in mixed data columns

Time:10-19

I have a csv with a column of mixed data types, as shown by my dummy table below. Using python and pandas, I am able to add leading zeros to my csv using:

df = pd.read_csv('test.csv')
df['column_A'] = df['column_A'].str.zfill(10)

I am trying to add leading zeros to the column but only want it to be added to data that are integers specific, i.e. row 3. The issue is that it is reading row 5 as integers as well, despite there being a '-' which I thought would make it a str. How do I specify to only add leading zeros to only integers cells with less than 10 digits?

Below is an example of the data present in the column and what outcomes I am aiming for.

| Raw Data      | Ideal Outcome| Actual Outcome|
------------------------------------------------
| ABC-1234      | ABC-1234     | ABC-1234      |
| 000-1234-000  | 000-1234-000 | 000-1234-000  |
| 1234          | 0000001234   | 0000001234    |
| 1234567890123 | 1234567890123| 1234567890123 |
| 00-18         | 00-18        | 00000000-18   |

Table 1

CodePudding user response:

You can use DataFrame.applymap to apply a custom function to each cell of the DataFrame. Try something like:

def my_func(value):
    str_value = str(value)
    if str_value.isnumeric():
        return str_value.zfill(10)
    else:
        return value

df = df.applymap(my_func)

CodePudding user response:

to update the Raw-Data column using loc

# check if the value in raw-data is numeric, when true append zeros to it

df.loc[df['Raw Data'].str.isnumeric(),'Raw Data' ] = df['Raw Data'].astype(str).str.zfill(10)
df
    Raw Data       Ideal Outcome    Actual Outcome
0   ABC-1234       ABC-1234         ABC-1234
1   000-1234-000   000-1234-000     000-1234-000
2   0000001234     0000001234       0000001234
3   1234567890123  1234567890123    1234567890123
4   00-18          00-18            00000000-18

OR

using mask

#creating a new column 'outcome' when raw-data value isnumeric, with zero prepended to it

df['outcome']=df['Raw Data'].mask(df['Raw Data'].str.isnumeric() , df['Raw Data'].astype(str).str.zfill(10))
df
Raw Data            Ideal Outcome   Actual Outcome  outcome
0   ABC-1234        ABC-1234        ABC-1234        ABC-1234
1   000-1234-000    000-1234-000    000-1234-000    000-1234-000
2   0000001234      0000001234      0000001234      0000001234
3   1234567890123   1234567890123   1234567890123   1234567890123
4   00-18           00-18           00000000-18     00-18
  • Related