I am receiving this dictionary of strings as an API response, which will be further used as an input for another task.
{' cAVDNIIB ': ' pattern not matched: "[2022-07-25 06:40:51.147] [Information] LW () <RDP> Bot execution completed - JOBNAME:\\"TEST_NAME_1\\" JOBID:\\"2022072564027\\" TYPE:\\"Desktop\\" startTime:\\"1658731228\\" endTime:\\"1658731251\\" botMachineName:\\"LW\\" botMachineIP:\\"SOME_IP\\" state:\\"completed\\" status:\\"successful\\" Action:\\"run\\"" ',
' cQVDNIIB ': ' pattern not matched: "[2022-07-25 07:30:11.711] [Information] LW () <RDP> Bot consumer listening for messages." ',
' cgVDNIIB ': ' pattern not matched: "[2022-07-25 07:30:11.714] [Information] LW () <RDP> Message received from queue - JOBNAME:\\"TEST_NAME\\" JOBID:\\"2022072573011\\" TYPE:\\"Desktop\\" startTime:\\"1658734211\\" Action:\\"run\\"" ',
' cwVDNIIB ': ' pattern not matched: "[2022-07-25 07:30:11.717] [Information] LW () <RDP> Bot action:\\"run\\"" ',
' dAVDNIIB ': ' pattern not matched: "[2022-07-25 07:30:11.717] [Information] LW () <RDP> [x] \\"{\\\\\\"JOBNAME\\\\\\":\\\\\\"TEST_NAME\\\\\\",\\\\\\"TYPE\\\\\\":\\\\\\"Desktop\\\\\\",\\\\\\"rdpRequired\\\\\\":false,\\\\\\"state\\\\\\":\\\\\\"Running\\\\\\",\\\\\\"status\\\\\\":\\\\\\"Running\\\\\\",\\\\\\"requestorId\\\\\\":null,\\\\\\"JOBID\\\\\\":\\\\\\"2022072573011\\\\\\",\\\\\\"startTime\\\\\\":null,\\\\\\"endTime\\\\\\":null,\\\\\\"botstatus\\\\\\":null,\\\\\\"action\\\\\\":\\\\\\"run\\\\\\"}\\"" ',
' dQVDNIIB ': ' pattern not matched: "[2022-07-25 07:30:11.718] [Information] LW () <RDP> Creating background process \'\\"C:\\\\Tools\\\\Flows\\\\Desktop\\\\TEST_NAME\\\\Simulator.exe\\"\'" ',
' gQVENIIB ': ' pattern not matched: "[2022-07-25 07:30:11.743] [Information] LW () <RDP> Bot execution started - JOBNAME:\\"TEST_NAME\\" JOBID:\\"2022072573011\\" TYPE:\\"Desktop\\" startTime:\\"1658734211\\" botMachineName:\\"LW\\" botMachineIP:\\"SOME_IP\\" state:\\"Running\\" status:\\"Running\\" Action:\\"run\\"" ',
' ggVENIIB ': ' pattern not matched: "[2022-07-25 07:31:23.368] [Information] LW () <RDP> Published message to response queue: ExchangeName:\\"liteportal.exchange\\", Message:\\"{\\\\\\"requestorId\\\\\\":null,\\\\\\"rdpRequired\\\\\\":false,\\\\\\"JOBNAME\\\\\\":\\\\\\"TEST_NAME\\\\\\",\\\\\\"JOBID\\\\\\":\\\\\\"2022072573011\\\\\\",\\\\\\"TYPE\\\\\\":\\\\\\"Desktop\\\\\\",\\\\\\"startTime\\\\\\":\\\\\\"1658734211\\\\\\",\\\\\\"endTime\\\\\\":\\\\\\"1658734283\\\\\\",\\\\\\"status\\\\\\":\\\\\\"successful\\\\\\",\\\\\\"action\\\\\\":\\\\\\"run\\\\\\",\\\\\\"state\\\\\\":\\\\\\"completed\\\\\\",\\\\\\"botMachineName\\\\\\":\\\\\\"LW\\\\\\",\\\\\\"botMachineOS\\\\\\":\\\\\\"WindowsXP 6.2.9200.0\\\\\\",\\\\\\"botMachineIP\\\\\\":\\\\\\"SOME_IP\\\\\\",\\\\\\"Message\\\\\\":null}\\" " ',
' gwVENIIB ': ' pattern not matched: "[2022-07-25 07:31:23.373] [Information] LW () <RDP> Log file path: \\"C:\\\\Tools\\\\Flows\\\\Desktop\\\\TEST_NAME\\\\logs\\\\Logs-2022_07_25.log\\"" ',
' hAVENIIB ': ' pattern not matched: "[2022-07-25 07:31:23.374] [Information] LW () <RDP> Log file exists" ',
' hgVENIIB ': ' pattern not matched: "[2022-07-25 07:31:23.384] [Information] LW () <RDP> Bot log file deleted: \\"C:\\\\Tools\\\\Flows\\\\Desktop\\\\TEST_NAME\\\\logs\\\\Logs-2022_07_25.log\\"" '}
Need to convert this input into a dataframe for only those having all these values:
jobid | jobname | type | startTime | endTime | state | status |
---|---|---|---|---|---|---|
2022072564027 | TEST_NAME_1 | Desktop | 1658731228 | 1658731251 | completed | successful |
2022072573011 | TEST_NAME | Desktop | 1658734211 | 1658734283 | completed | successful |
Tried parsing the key values for removing the special characters and backslashes and also by using regex to extract the desired matches but unable to run through each iteration for the correct matches.
Please share your thoughts and suggest the best possible solution in Python? Thank you for your help!!
CodePudding user response:
Definitely REGEX. You will have to use regex to get close to the item you need and use REGEX to extract the value you need. For example, to extract JOBID
, I import re
, a python module, and use re.search
method to look for the JOBID
indexes from the string value. Then I use that last index of JOBID
with span
attribute from re.search
method to do another REGEX search. I then extract the value using group
attribute and collect those value to the list. Notice that not all data that you collected contains JOBID
, thus, I am returning "NaN" for those to the list.
Code Example:
import re
data = {
' cAVDNIIB ': ' pattern not matched: "[2022-07-25 06:40:51.147] [Information] LW () <RDP> Bot execution completed - JOBNAME:\\"TEST_NAME_1\\" JOBID:\\"2022072564027\\" TYPE:\\"Desktop\\" startTime:\\"1658731228\\" endTime:\\"1658731251\\" botMachineName:\\"LW\\" botMachineIP:\\"SOME_IP\\" state:\\"completed\\" status:\\"successful\\" Action:\\"run\\"" ',
' cQVDNIIB ': ' pattern not matched: "[2022-07-25 07:30:11.711] [Information] LW () <RDP> Bot consumer listening for messages." ',
' cgVDNIIB ': ' pattern not matched: "[2022-07-25 07:30:11.714] [Information] LW () <RDP> Message received from queue - JOBNAME:\\"TEST_NAME\\" JOBID:\\"2022072573011\\" TYPE:\\"Desktop\\" startTime:\\"1658734211\\" Action:\\"run\\"" ',
' cwVDNIIB ': ' pattern not matched: "[2022-07-25 07:30:11.717] [Information] LW () <RDP> Bot action:\\"run\\"" ',
' dAVDNIIB ': ' pattern not matched: "[2022-07-25 07:30:11.717] [Information] LW () <RDP> [x] \\"{\\\\\\"JOBNAME\\\\\\":\\\\\\"TEST_NAME\\\\\\",\\\\\\"TYPE\\\\\\":\\\\\\"Desktop\\\\\\",\\\\\\"rdpRequired\\\\\\":false,\\\\\\"state\\\\\\":\\\\\\"Running\\\\\\",\\\\\\"status\\\\\\":\\\\\\"Running\\\\\\",\\\\\\"requestorId\\\\\\":null,\\\\\\"JOBID\\\\\\":\\\\\\"2022072573011\\\\\\",\\\\\\"startTime\\\\\\":null,\\\\\\"endTime\\\\\\":null,\\\\\\"botstatus\\\\\\":null,\\\\\\"action\\\\\\":\\\\\\"run\\\\\\"}\\"" ',
' dQVDNIIB ': ' pattern not matched: "[2022-07-25 07:30:11.718] [Information] LW () <RDP> Creating background process \'\\"C:\\\\Tools\\\\Flows\\\\Desktop\\\\TEST_NAME\\\\Simulator.exe\\"\'" ',
' gQVENIIB ': ' pattern not matched: "[2022-07-25 07:30:11.743] [Information] LW () <RDP> Bot execution started - JOBNAME:\\"TEST_NAME\\" JOBID:\\"2022072573011\\" TYPE:\\"Desktop\\" startTime:\\"1658734211\\" botMachineName:\\"LW\\" botMachineIP:\\"SOME_IP\\" state:\\"Running\\" status:\\"Running\\" Action:\\"run\\"" ',
' ggVENIIB ': ' pattern not matched: "[2022-07-25 07:31:23.368] [Information] LW () <RDP> Published message to response queue: ExchangeName:\\"liteportal.exchange\\", Message:\\"{\\\\\\"requestorId\\\\\\":null,\\\\\\"rdpRequired\\\\\\":false,\\\\\\"JOBNAME\\\\\\":\\\\\\"TEST_NAME\\\\\\",\\\\\\"JOBID\\\\\\":\\\\\\"2022072573011\\\\\\",\\\\\\"TYPE\\\\\\":\\\\\\"Desktop\\\\\\",\\\\\\"startTime\\\\\\":\\\\\\"1658734211\\\\\\",\\\\\\"endTime\\\\\\":\\\\\\"1658734283\\\\\\",\\\\\\"status\\\\\\":\\\\\\"successful\\\\\\",\\\\\\"action\\\\\\":\\\\\\"run\\\\\\",\\\\\\"state\\\\\\":\\\\\\"completed\\\\\\",\\\\\\"botMachineName\\\\\\":\\\\\\"LW\\\\\\",\\\\\\"botMachineOS\\\\\\":\\\\\\"WindowsXP 6.2.9200.0\\\\\\",\\\\\\"botMachineIP\\\\\\":\\\\\\"SOME_IP\\\\\\",\\\\\\"Message\\\\\\":null}\\" " ',
' gwVENIIB ': ' pattern not matched: "[2022-07-25 07:31:23.373] [Information] LW () <RDP> Log file path: \\"C:\\\\Tools\\\\Flows\\\\Desktop\\\\TEST_NAME\\\\logs\\\\Logs-2022_07_25.log\\"" ',
' hAVENIIB ': ' pattern not matched: "[2022-07-25 07:31:23.374] [Information] LW () <RDP> Log file exists" ',
' hgVENIIB ': ' pattern not matched: "[2022-07-25 07:31:23.384] [Information] LW () <RDP> Bot log file deleted: \\"C:\\\\Tools\\\\Flows\\\\Desktop\\\\TEST_NAME\\\\logs\\\\Logs-2022_07_25.log\\"" '}
## to collect the values you need
jobid_temp = []
for k,v in data.items():
## identify JOBID index values from the string
search0 = re.search('JOBID',data[k])
## JOBID pattern not existing will return None type by default
if re.search('JOBID',v) == None:
jobid_temp.append("NaN") ## append NaN value to the list
## for illustration purpose
print(k, 'Nothing Exist')
else:
## use index to make the string smaller
string1 = v[search0.span()[1]:]
## extract the value
jobid = re.search('\d ', string1).group(0)
jobid_temp.append(jobid)
## for illustration purpose
print(k, jobid)
Output from print
statement:
cAVDNIIB 2022072564027
cQVDNIIB Nothing Exist
cgVDNIIB 2022072573011
cwVDNIIB Nothing Exist
dAVDNIIB 2022072573011
dQVDNIIB Nothing Exist
gQVENIIB 2022072573011
ggVENIIB 2022072573011
gwVENIIB Nothing Exist
hAVENIIB Nothing Exist
hgVENIIB Nothing Exist
CodePudding user response:
I think you can just loop through it line by line and then use something like this: Does Python have a string 'contains' substring method?
Otherwise, you would have to be a little more specific about what the problem is you're running into.