I want to change the value of a particular column if part of another column is found,
for example, I have the following data frame :
**DATE** **TIME** **VALUE**
20060103 02:01:00 54
20060103 03:02:00 12
20060103 05:03:00 21
20060103 08:05:00 54
20060103 06:06:00 87
20060103 02:07:00 79
20060103 02:08:00 46
I want to change the value in the VALUE column to VALUE of 30, only if the hourly value of the TIME column is equal to 02.
So the desired Data frame would be :
**DATE** **TIME** **VALUE**
20060103 02:01:00 30
20060103 03:02:00 12
20060103 05:03:00 21
20060103 08:05:00 54
20060103 06:06:00 87
20060103 02:07:00 30
20060103 02:08:00 30
Notice how in rows 1 6 and 7 the VALUE changed to 30, because the hour value in the TIME column starts at 02.
I tried to do it the simple way and go over each row and set the value:
import pandas as pd
df = pd.read_csv('file.csv')
for a in df['TIME']:
if a[:2] == '02':
df["VALUE"] = 30
df.to_csv("file.csv", index=False)
But unfortunately this is a file with tens of millions of lines, and this method will take me forever. I would appreciate if anyone has a more creative and effective method .
Thanks !
CodePudding user response:
Try loc
assignment:
df.loc[pd.to_datetime(df['Time']).dt.hour == 2, 'Value'] = 30
Or:
df.loc[df['Time'].str[:2] == '02', 'Value'] = 30
CodePudding user response:
You can try apply method to iterate through each rows
df['VALUE'] = df1.apply(lambda x: 30 if x['TIME'][:2]=='02' else x['VALUE'], axis='columns')
CodePudding user response:
import io
data = '''DATE TIME VALUE
20060103 02:01:00 54
20060103 03:02:00 12
20060103 05:03:00 21
20060103 08:05:00 54
20060103 06:06:00 87
20060103 02:07:00 79
20060103 02:08:00 46'''
df = pd.read_csv(io.StringIO(data), sep=' \s ', engine='python')
df.loc[df['TIME'].str[:2]=='02', 'VALUE'] =30
CodePudding user response:
You can achieve this using np.where()
which should be a bit faster.
import numpy as np
In [67]: df['VALUE'] = np.where(df['TIME'].str[:2]=='02', 30, df['VALUE'])
In [68]: df
Out[68]:
DATE TIME VALUE
0 20060103 02:01:00 30
1 20060103 03:02:00 12
2 20060103 05:03:00 21
3 20060103 08:05:00 54
4 20060103 06:06:00 87
5 20060103 02:07:00 30
6 20060103 02:08:00 30