Home > Blockchain >  Retrieving a specific value from a column and store it in a new column depending on the conditions t
Retrieving a specific value from a column and store it in a new column depending on the conditions t

Time:11-14

I am new to pandas and I need help. I have a set of data as given:

Index sensor timestamp
0 temperature 10/09/2019 10:49:00
1 humidity 10/09/2019 10:50:00
2 light 10/09/2019 10:50:00
3 motion 10/09/2019 10:50:00
4 temperature 10/09/2019 11:19:00
5 humidity 10/09/2019 11:20:00
6 light 10/09/2019 11:20:00
7 motion 10/09/2019 11:20:00
8 temperature 10/09/2019 11:34:00

Given data is not quite systematic for me, thus I want to add a new column named temperature and store its corresponding timestamp values.

I want to make a new column named Temperature and store it's corresponding timestamp value. The expected dataframe would be like the figure:

index sensor timestamp temperature
0 temperature 10/09/2019 10:49:00 10/09/2019 10:49:00
1 humidity 10/09/2019 10:50:00 not related
2 light 10/09/2019 10:50:00 not related
3 motion 10/09/2019 10:50:00 not related
4 temperature 10/09/2019 11:19:00 10/09/2019 11:19:00
5 humidity 10/09/2019 11:20:00 not related
6 light 10/09/2019 11:20:00 not related
7 motion 10/09/2019 11:20:00 not related
8 temperature 10/09/2019 11:34:00 10/09/2019 11:34:00

The idea that I've come out with is that I inspect each row in the sensor column to either contain temperature or not. I've created an empty list so that I could append the value and add it to the original dataframe later on.

List = []

If sensor = 'temperature' then the timestamp value will be stored in the new column and 'not_related' is given when sensor != 'temperature'. I tried to convert the idea into codes and this is where I am stuck.

for row in df['sensor']:
    if row == 'temperature' : List.append(df.loc[df[df['sensor']=='temperature'].index.values , 'timestamp'])
    else : List.append('Not related')

The problem with the code is that it stored all of the timestamp value that is equal to temperature and not its corresponding single value.

Example of what I get when I run these codes:

List[4] 
0       2019-10-09 10:49:00
4       2019-10-09 11:19:00
8       2019-10-09 11:34:00
12      2019-10-09 11:49:00
16      2019-10-09 12:04:00
                ...        
86703   2021-03-22 13:29:00
86898   2021-03-25 14:36:00
86903   2021-03-25 14:51:00
86944   2021-03-28 16:52:00
87325   2021-07-19 10:03:00
Name: timestamp, Length: 8236, dtype: datetime64[ns]
List[1] 

'Not related'

List[0:5] 
[0       2019-10-09 10:49:00
 4       2019-10-09 11:19:00
 8       2019-10-09 11:34:00
 12      2019-10-09 11:49:00
 16      2019-10-09 12:04:00
                 ...        
 86703   2021-03-22 13:29:00
 86898   2021-03-25 14:36:00
 86903   2021-03-25 14:51:00
 86944   2021-03-28 16:52:00
 87325   2021-07-19 10:03:00
 Name: timestamp, Length: 8236, dtype: datetime64[ns],
 'Not related',
 'Not related',
 'Not related',
 0       2019-10-09 10:49:00
 4       2019-10-09 11:19:00
 8       2019-10-09 11:34:00
 12      2019-10-09 11:49:00
 16      2019-10-09 12:04:00
                 ...        
 86703   2021-03-22 13:29:00
 86898   2021-03-25 14:36:00
 86903   2021-03-25 14:51:00
 86944   2021-03-28 16:52:00
 87325   2021-07-19 10:03:00
 Name: timestamp, Length: 8236, dtype: datetime64[ns]]

The reason for such idea is to ease my calculation between column later on. Any insight or other methods would be much appreciated.

CodePudding user response:

You can use np.where() to give values on a condition. So for example you can use the command as below to say:
1- if df['sensor'] == 'temperature', then get the corresponding value from df['timestamp'].
2- If not, then set the value to 'not related'.
3- Finally, save the result to a new column in the dataframe and call it 'temperature'.
This is done in the following one-liner.

This should work:

df['temperature'] = np.where(df['sensor'] == 'temperature', df['timestamp'], 'not related')

CodePudding user response:

Simply use Series.where:

df['temperature'] = df['timestamp'].where(df['sensor'].eq('temperature'), 'not related')
  • Related