I'm working on a python project and got a dataframe with multiple columns and rows.
I would like to get rid of everything but the digits in every cell of the dataframe. Is it possible to do this without using loops?
Here is a sample from the data:
a b c d e f g h
1 att-7 att-3 att-10 att-10 att-15 att-11 att-2 att-7
2 att-9 att-7 att-12 att-4 att-10 att-4 att-13 att-4
3 att-10 att-6 att-1 att-1 att-13 att-12 att-9 att-6
I would like to apply somehting like this:
def modify_string(cell):
return cell.str.extract(r'(\d )')
df_modified = df.apply(lambda x: modify_string(x))
Is it possible to avoid loops here? What would be the most efficient way since the data is relatively big? How would you solve this problem?
CodePudding user response:
df1
df2 = df1.astype('str').replace('att-', '', regex=True)
df2
Update: if you need to use values as numbers after that just add the following
df2 = df2.astype('int64')
index | a | b | c | d | e | f | g | h |
---|---|---|---|---|---|---|---|---|
1 | 7 | 3 | 10 | 10 | 15 | 11 | 2 | 7 |
2 | 9 | 7 | 12 | 4 | 10 | 4 | 13 | 4 |
3 | 10 | 6 | 1 | 1 | 13 | 12 | 9 | 6 |
CodePudding user response:
The first way using applymap
will apply the function elementwise. It relies on the numbers being followed by a '-'.
df.applymap(lambda x: x.split('-')[-1])
If this is not always the case, you could also use str.extract
and extract the numbers.
df.stack().str.extract(r'(\d )',expand=False).unstack()
Output:
a b c d e f g h
1 7 3 10 10 15 11 2 7
2 9 7 12 4 10 4 13 4
3 10 6 1 1 13 12 9 6
CodePudding user response:
I would use: https://pypi.org/project/pandarallel/ and simple apply function.