Change Column value based on part of another column using pandas-CodePudding

I want to change the value of a particular column if part of another column is found,

for example, I have the following data frame :

**DATE**    **TIME**    **VALUE**
20060103    02:01:00    54
20060103    03:02:00    12
20060103    05:03:00    21
20060103    08:05:00    54
20060103    06:06:00    87
20060103    02:07:00    79
20060103    02:08:00    46

I want to change the value in the VALUE column to VALUE of 30, only if the hourly value of the TIME column is equal to 02.

So the desired Data frame would be :

**DATE**    **TIME**    **VALUE**
20060103    02:01:00    30
20060103    03:02:00    12
20060103    05:03:00    21
20060103    08:05:00    54
20060103    06:06:00    87
20060103    02:07:00    30
20060103    02:08:00    30

Notice how in rows 1 6 and 7 the VALUE changed to 30, because the hour value in the TIME column starts at 02.

I tried to do it the simple way and go over each row and set the value:

import pandas as pd

df = pd.read_csv('file.csv')

for a in df['TIME']:
    if a[:2] == '02':
        df["VALUE"] = 30

df.to_csv("file.csv", index=False)

But unfortunately this is a file with tens of millions of lines, and this method will take me forever. I would appreciate if anyone has a more creative and effective method .

Thanks !

CodePudding user response：

Try loc assignment:

df.loc[pd.to_datetime(df['Time']).dt.hour == 2, 'Value'] = 30

Or:

df.loc[df['Time'].str[:2] == '02', 'Value'] = 30

CodePudding user response：

You can try apply method to iterate through each rows

df['VALUE'] = df1.apply(lambda x: 30 if x['TIME'][:2]=='02' else x['VALUE'], axis='columns')

CodePudding user response：

import io

data = '''DATE    TIME    VALUE
20060103    02:01:00    54
20060103    03:02:00    12
20060103    05:03:00    21
20060103    08:05:00    54
20060103    06:06:00    87
20060103    02:07:00    79
20060103    02:08:00    46'''
df = pd.read_csv(io.StringIO(data), sep=' \s ', engine='python')

df.loc[df['TIME'].str[:2]=='02', 'VALUE'] =30

CodePudding user response：

You can achieve this using np.where() which should be a bit faster.

import numpy as np 

In [67]: df['VALUE'] = np.where(df['TIME'].str[:2]=='02', 30, df['VALUE'])

In [68]: df
Out[68]: 
       DATE      TIME  VALUE
0  20060103  02:01:00     30
1  20060103  03:02:00     12
2  20060103  05:03:00     21
3  20060103  08:05:00     54
4  20060103  06:06:00     87
5  20060103  02:07:00     30
6  20060103  02:08:00     30