I am new to pandas and I need help. I have a set of data as given:
Index | sensor | timestamp |
---|---|---|
0 | temperature | 10/09/2019 10:49:00 |
1 | humidity | 10/09/2019 10:50:00 |
2 | light | 10/09/2019 10:50:00 |
3 | motion | 10/09/2019 10:50:00 |
4 | temperature | 10/09/2019 11:19:00 |
5 | humidity | 10/09/2019 11:20:00 |
6 | light | 10/09/2019 11:20:00 |
7 | motion | 10/09/2019 11:20:00 |
8 | temperature | 10/09/2019 11:34:00 |
Given data is not quite systematic for me, thus I want to add a new column named temperature
and store its corresponding timestamp
values.
I want to make a new column named Temperature
and store it's corresponding timestamp value. The expected dataframe would be like the figure:
index | sensor | timestamp | temperature |
---|---|---|---|
0 | temperature | 10/09/2019 10:49:00 | 10/09/2019 10:49:00 |
1 | humidity | 10/09/2019 10:50:00 | not related |
2 | light | 10/09/2019 10:50:00 | not related |
3 | motion | 10/09/2019 10:50:00 | not related |
4 | temperature | 10/09/2019 11:19:00 | 10/09/2019 11:19:00 |
5 | humidity | 10/09/2019 11:20:00 | not related |
6 | light | 10/09/2019 11:20:00 | not related |
7 | motion | 10/09/2019 11:20:00 | not related |
8 | temperature | 10/09/2019 11:34:00 | 10/09/2019 11:34:00 |
The idea that I've come out with is that I inspect each row in the sensor
column to either contain temperature
or not. I've created an empty list so that I could append the value and add it to the original dataframe later on.
List = []
If sensor = 'temperature'
then the timestamp value will be stored in the new column and 'not_related' is given when sensor != 'temperature'
. I tried to convert the idea into codes and this is where I am stuck.
for row in df['sensor']:
if row == 'temperature' : List.append(df.loc[df[df['sensor']=='temperature'].index.values , 'timestamp'])
else : List.append('Not related')
The problem with the code is that it stored all of the timestamp value that is equal to temperature
and not its corresponding single value.
Example of what I get when I run these codes:
List[4]
0 2019-10-09 10:49:00
4 2019-10-09 11:19:00
8 2019-10-09 11:34:00
12 2019-10-09 11:49:00
16 2019-10-09 12:04:00
...
86703 2021-03-22 13:29:00
86898 2021-03-25 14:36:00
86903 2021-03-25 14:51:00
86944 2021-03-28 16:52:00
87325 2021-07-19 10:03:00
Name: timestamp, Length: 8236, dtype: datetime64[ns]
List[1]
'Not related'
List[0:5]
[0 2019-10-09 10:49:00
4 2019-10-09 11:19:00
8 2019-10-09 11:34:00
12 2019-10-09 11:49:00
16 2019-10-09 12:04:00
...
86703 2021-03-22 13:29:00
86898 2021-03-25 14:36:00
86903 2021-03-25 14:51:00
86944 2021-03-28 16:52:00
87325 2021-07-19 10:03:00
Name: timestamp, Length: 8236, dtype: datetime64[ns],
'Not related',
'Not related',
'Not related',
0 2019-10-09 10:49:00
4 2019-10-09 11:19:00
8 2019-10-09 11:34:00
12 2019-10-09 11:49:00
16 2019-10-09 12:04:00
...
86703 2021-03-22 13:29:00
86898 2021-03-25 14:36:00
86903 2021-03-25 14:51:00
86944 2021-03-28 16:52:00
87325 2021-07-19 10:03:00
Name: timestamp, Length: 8236, dtype: datetime64[ns]]
The reason for such idea is to ease my calculation between column later on. Any insight or other methods would be much appreciated.
CodePudding user response:
You can use np.where() to give values on a condition. So for example you can use the command as below to say:
1- if df['sensor'] == 'temperature', then get the corresponding value from df['timestamp'].
2- If not, then set the value to 'not related'.
3- Finally, save the result to a new column in the dataframe and call it 'temperature'.
This is done in the following one-liner.
This should work:
df['temperature'] = np.where(df['sensor'] == 'temperature', df['timestamp'], 'not related')
CodePudding user response:
Simply use Series.where
:
df['temperature'] = df['timestamp'].where(df['sensor'].eq('temperature'), 'not related')