Home > Mobile >  Efficient way to modify every cell in a dataframe
Efficient way to modify every cell in a dataframe

Time:06-02

I'm working on a python project and got a dataframe with multiple columns and rows.

I would like to get rid of everything but the digits in every cell of the dataframe. Is it possible to do this without using loops?

Here is a sample from the data:

         a       b       c       d       e       f        g      h   
1    att-7   att-3  att-10  att-10   att-15  att-11    att-2  att-7  
2    att-9   att-7  att-12   att-4   att-10   att-4   att-13  att-4  
3   att-10   att-6   att-1   att-1   att-13  att-12    att-9  att-6  

I would like to apply somehting like this:

def modify_string(cell):
    return cell.str.extract(r'(\d )')

df_modified = df.apply(lambda x: modify_string(x))

Is it possible to avoid loops here? What would be the most efficient way since the data is relatively big? How would you solve this problem?

CodePudding user response:

df1
df2 = df1.astype('str').replace('att-', '', regex=True)
df2

Update: if you need to use values as numbers after that just add the following

df2 = df2.astype('int64')
index a b c d e f g h
1 7 3 10 10 15 11 2 7
2 9 7 12 4 10 4 13 4
3 10 6 1 1 13 12 9 6

CodePudding user response:

The first way using applymap will apply the function elementwise. It relies on the numbers being followed by a '-'.

df.applymap(lambda x: x.split('-')[-1])

If this is not always the case, you could also use str.extract and extract the numbers.

df.stack().str.extract(r'(\d )',expand=False).unstack()

Output:

    a  b   c   d   e   f   g  h
1   7  3  10  10  15  11   2  7
2   9  7  12   4  10   4  13  4
3  10  6   1   1  13  12   9  6

CodePudding user response:

I would use: https://pypi.org/project/pandarallel/ and simple apply function.

  • Related