Home > other >  Deleting everything before and after regex match in Pandas
Deleting everything before and after regex match in Pandas

Time:10-10

I have a dataframe with a wide amount of text in one of the columns. It looks something like this:


[{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:07:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097683000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097712000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097742000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:09:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097772000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:09:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097803000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097833000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097863000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:11:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097892000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:11:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097922000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097952000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097983000'}}, {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:13:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620098013000'}}

I want to be able to delete everything that isn't in a line that matches minute 59 and keep the line where it does so the column would only show:


{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:59:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '16200976821350'}}

I tried using regex but I can't seem to make it work.

/({'occupancy_state': '[1],[0-9] ,[0-9] ,[0-9] ', 'rtc_utc_time': '[2][0][0-9][0-9]-[0-1][0-9]-[0-9][0-9] [0-2][0-9]:*59:[0-9][0-9]', 'sdcard_site': '[a-zA-Z] ', 'sdcard_chain': '[a-zA-Z] ', 'sdcard_line': '[0-9] ,[a-zA-Z] ', 'utc': {'.numberLong': '[0-9] '}})/g

Any help is appreciated

CodePudding user response:

Using following Code you can select dictionaries with 59 minutes:

[i for i in myList if i['rtc_utc_time'].partition(":")[2][:2] == '59']

NB : Your provided input not contaning any dictionary with 59 minute.

CodePudding user response:

use a list comprehension and split to find the minutes on the rtc_utc_time

data=[{'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:07:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097683000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097712000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:08:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097742000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:09:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097772000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:59:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097803000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097833000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:10:58', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097863000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:11:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097892000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:59:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097922000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097952000'}},  
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:12:57', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620097983000'}}, 
      {'occupancy_state': '1,0,0,0', 'rtc_utc_time': '2021-05-04 03:13:27', 'sdcard_site': 'BE', 'sdcard_chain': 'DAG', 'sdcard_line': '1,BE', 'utc': {'$numberLong': '1620098013000'}}]

   
[print (row) for row in data if row["rtc_utc_time"].split(':')[1]!= '59']
  • Related