Home > OS >  Manually Edit cells in pandas data frame on a condition
Manually Edit cells in pandas data frame on a condition

Time:10-19

We have the following dummy data frame that collects counts of warnings (based on a reason) for a given date:

temp = pd.DataFrame(np.array([['2022-10-13','one',123],['2022-10-13','two',77123],
                              ['2022-10-13','three',451], ['2022-10-13','three',77]]),
                            columns = ['date','reason','count'])

The problem is in the count column

     date             reason     count
0    2022-10-13       one        123
1    2022-10-13       two        77123
2    2022-10-13       three      451
3    2022-10-13       three      77

The data for reasons one and three needs to be scaled by 100 as it is stored in a minimized way on the database.

Is there a way to traverse the cells and add '00' to the counts or multiply by 100 where reason is not equal to two? And end up with something like this:

     date             reason     count
0    2022-10-13       one        12300
1    2022-10-13       two        77123
2    2022-10-13       three      45100
3    2022-10-13       three      7700

How would this be achieved?

CodePudding user response:

First make sure the column count are integers (or floats), then use .loc[] and select the rows you wish to adapt (temp.reason != 'two') and make the change (temp['count'] * 100):

temp['count'] = temp['count'].astype(int)
temp.loc[temp.reason != 'two', 'count'] = temp['count'] * 100
print(temp)

Output

         date reason  count
0  2022-10-13    one  12300
1  2022-10-13    two  77123
2  2022-10-13  three  45100
3  2022-10-13  three   7700

CodePudding user response:

Use DataFrame.loc with *=100 for multiple values if not matched by Series.ne:

temp['count'] = temp['count'].astype(int)

temp.loc[temp['reason'].ne('two'), 'count'] *= 100
print (temp)

         date reason  count
0  2022-10-13    one  12300
1  2022-10-13    two  77123
2  2022-10-13  three  45100
3  2022-10-13  three   7700

CodePudding user response:

You can use np.where

temp['count'] = temp['count'].astype(int)
temp['count'] = np.where(temp['reason']!='two', temp['count']*100, temp['count'])
  • Related