Pandas, why can't I assign with df.loc[condition, 'col'] = df.loc[condition, 'co-CodePudding

I'm trying to change a dataframe column using

df.loc[df['xxx'].notna(), 'xxx'] = df.loc[df['xxx'].notna(), 'xxx'].astype(str).str[:10].str.replace('-','')

This does not seem to have any effect on the column's values. When running it without the loc[conditional, 'xxx'], it does seem to work

df['xxx'] = df['xxx'].astype(str).str[:10].str.replace('-','')

This challenges my core understanding of pandas, since I always use .loc to change a subset of a row.

I'm using pandas 1.2.4

CodePudding user response：

My test is effect, test code as below. But my version is 1.0.4.

import pandas as pd
print(pd.__version__)
df = pd.DataFrame(
    {'xxx': ['AABBCC-DDEEE', 'DIs-sssssssssssP', 'KKK', 'A', 'A'],
     'tmp': [1, 2, 3, 4, 5]})
print(df)
df.loc[df['xxx'].notna(), 'xxx'] = df.loc[df['xxx'].notna(), 'xxx'].astype(str).str[:10].str.replace('-','')
print(df)

Result as below

1.0.4
                xxx  tmp
0      AABBCC-DDEEE    1
1  DIs-sssssssssssP    2
2               KKK    3
3                 A    4
4                 A    5
         xxx  tmp
0  AABBCCDDE    1
1  DIsssssss    2
2        KKK    3
3          A    4
4          A    5

CodePudding user response：

For me working your solution correct, here is alternative solution:

df = pd.DataFrame({'xxx': ['AABBCC-DDEEE', 'AABBCC-DDEEE', np.nan, np.nan]})
print(df)
            xxx
0  AABBCC-DDEEE
1  AABBCC-DDEEE
2           NaN
3           NaN

df.update(df.loc[df['xxx'].notna(), 'xxx'].astype(str).str[:10].str.replace('-',''))
print(df)
         xxx
0  AABBCCDDE
1  AABBCCDDE
2        NaN
3        NaN

Your second solution converting missing values to nans strings:

df['xxx'] = df['xxx'].astype(str).str[:10].str.replace('-','')
print(df)
         xxx
0  AABBCCDDE
1  AABBCCDDE
2        nan
3        nan