Efficient way to modify every cell in a dataframe-CodePudding

I'm working on a python project and got a dataframe with multiple columns and rows.

I would like to get rid of everything but the digits in every cell of the dataframe. Is it possible to do this without using loops?

Here is a sample from the data:

         a       b       c       d       e       f        g      h   
1    att-7   att-3  att-10  att-10   att-15  att-11    att-2  att-7  
2    att-9   att-7  att-12   att-4   att-10   att-4   att-13  att-4  
3   att-10   att-6   att-1   att-1   att-13  att-12    att-9  att-6

I would like to apply somehting like this:

def modify_string(cell):
    return cell.str.extract(r'(\d )')

df_modified = df.apply(lambda x: modify_string(x))

Is it possible to avoid loops here? What would be the most efficient way since the data is relatively big? How would you solve this problem?

CodePudding user response：

df1
df2 = df1.astype('str').replace('att-', '', regex=True)
df2

Update: if you need to use values as numbers after that just add the following

df2 = df2.astype('int64')

index	a	b	c	d	e	f	g	h
1	7	3	10	10	15	11	2	7
2	9	7	12	4	10	4	13	4
3	10	6	1	1	13	12	9	6

CodePudding user response：

The first way using applymap will apply the function elementwise. It relies on the numbers being followed by a '-'.

df.applymap(lambda x: x.split('-')[-1])

If this is not always the case, you could also use str.extract and extract the numbers.

df.stack().str.extract(r'(\d )',expand=False).unstack()

Output:

    a  b   c   d   e   f   g  h
1   7  3  10  10  15  11   2  7
2   9  7  12   4  10   4  13  4
3  10  6   1   1  13  12   9  6

CodePudding user response：

I would use: https://pypi.org/project/pandarallel/ and simple apply function.