I want to parse the html(or andriod content?) from an mobile app, and i am doing something like
pageSource = driver.page_source
print("page = ",pageSource)
and what i got is the following:
page = <?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<hierarchy index="0" rotation="0" width="1080" height="2274">
<android.widget.FrameLayout index="0" package="testapp" text="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.widget.LinearLayout index="0" package="testapp" text="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.widget.FrameLayout index="0" package="testapp" text="" resource-id="android:id/content" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.widget.FrameLayout index="0" package="testapp" text="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.view.View index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.view.View index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.view.View index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.view.View index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.view.View index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,220]" displayed="true">
<android.widget.ImageView index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[22,66][154,220]" displayed="true" />
<android.widget.ImageView index="1" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[198,77][880,209]" displayed="true" />
</android.view.View>
<android.widget.ImageView index="1" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[54,264][142,396]" displayed="true" />
<android.view.View index="2" package="testapp" text="" content-desc="1" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[430,289][650,371]" displayed="true" />
<android.widget.ScrollView index="3" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="true" selected="false" bounds="[0,440][1080,2102]" displayed="true">
<android.widget.Button index="0" package="testapp" text="" content-desc="2;Po Lam" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[54,440][1026,608]" displayed="true" />
<android.view.View index="1" package="testapp" text="" content-desc="3" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[68,666][1011,770]" displayed="true" />
<android.widget.ImageView index="2" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[68,770][115,823]" displayed="true" />
<android.view.View index="3" package="testapp" text="" content-desc="4" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[115,770][446,823]" displayed="true" />
<android.view.View index="4" package="testapp" text="" content-desc="5" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[540,770][1012,823]" displayed="true" />
<android.view.View index="5" package="testapp" text="" content-desc="6" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[54,940][540,992]" displayed="true" />
<android.view.View index="6" package="testapp" text="" content-desc="about" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[714,940][750,992]" displayed="true" />
</android.widget.ScrollView>
<android.widget.ImageView index="4" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,2046][1080,2102]" displayed="true" />
<android.view.View index="5" package="testappp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,2102][1080,2274]" displayed="true">
<android.widget.ImageView index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,2102][1080,2274]" displayed="true" />
<android.widget.ImageView index="1" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,2114][216,2262]" displayed="true" />
<android.widget.ImageView index="2" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[432,2114][648,2262]" displayed="true" />
<android.widget.ImageView index="3" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[648,2114][864,2262]" displayed="true" />
<android.widget.ImageView index="4" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[864,2114][1080,2262]" displayed="true" />
</android.view.View>
</android.view.View>
</android.view.View>
</android.view.View>
</android.view.View>
</android.widget.FrameLayout>
</android.widget.FrameLayout>
</android.widget.LinearLayout>
<android.view.View index="2" package="testapp" text="" resource-id="android:id/navigationBarBackground" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,2274][1080,2340]" displayed="true" />
</android.widget.FrameLayout>
</hierarchy>
I want to get all the content of "content-desc"
Updated with full resource get from the webdriver and what i want is the "number" vs the text inside "content-desc".
I have tried
soup = BeautifulSoup(pageSource,"lxml")
with the soup return null
CodePudding user response:
Since you want all the tags that has an attribute content-desc
inside it. You can use regex to do this, as the xml has nested android.view.view
tags and content-disc
attribute are also present inside other tags in the xml.
Here is how we can do this :
Creating data
page = '''<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<hierarchy index="0" rotation="0" width="1080" height="2274">
<android.widget.FrameLayout index="0" package="testapp" text="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.widget.LinearLayout index="0" package="testapp" text="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.widget.FrameLayout index="0" package="testapp" text="" resource-id="android:id/content" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.widget.FrameLayout index="0" package="testapp" text="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.view.View index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.view.View index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.view.View index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.view.View index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,2274]" displayed="true">
<android.view.View index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,0][1080,220]" displayed="true">
<android.widget.ImageView index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[22,66][154,220]" displayed="true" />
<android.widget.ImageView index="1" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[198,77][880,209]" displayed="true" />
</android.view.View>
<android.widget.ImageView index="1" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[54,264][142,396]" displayed="true" />
<android.view.View index="2" package="testapp" text="" content-desc="1" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[430,289][650,371]" displayed="true" />
<android.widget.ScrollView index="3" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="true" selected="false" bounds="[0,440][1080,2102]" displayed="true">
<android.widget.Button index="0" package="testapp" text="" content-desc="2;Po Lam" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[54,440][1026,608]" displayed="true" />
<android.view.View index="1" package="testapp" text="" content-desc="3" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[68,666][1011,770]" displayed="true" />
<android.widget.ImageView index="2" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[68,770][115,823]" displayed="true" />
<android.view.View index="3" package="testapp" text="" content-desc="4" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[115,770][446,823]" displayed="true" />
<android.view.View index="4" package="testapp" text="" content-desc="5" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[540,770][1012,823]" displayed="true" />
<android.view.View index="5" package="testapp" text="" content-desc="6" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[54,940][540,992]" displayed="true" />
<android.view.View index="6" package="testapp" text="" content-desc="about" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[714,940][750,992]" displayed="true" />
</android.widget.ScrollView>
<android.widget.ImageView index="4" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,2046][1080,2102]" displayed="true" />
<android.view.View index="5" package="testappp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,2102][1080,2274]" displayed="true">
<android.widget.ImageView index="0" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="false" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,2102][1080,2274]" displayed="true" />
<android.widget.ImageView index="1" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,2114][216,2262]" displayed="true" />
<android.widget.ImageView index="2" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[432,2114][648,2262]" displayed="true" />
<android.widget.ImageView index="3" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[648,2114][864,2262]" displayed="true" />
<android.widget.ImageView index="4" package="testapp" text="" resource-id="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[864,2114][1080,2262]" displayed="true" />
</android.view.View>
</android.view.View>
</android.view.View>
</android.view.View>
</android.view.View>
</android.widget.FrameLayout>
</android.widget.FrameLayout>
</android.widget.LinearLayout>
<android.view.View index="2" package="testapp" text="" resource-id="android:id/navigationBarBackground" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" long-clickable="false" password="false" scrollable="false" selected="false" bounds="[0,2274][1080,2340]" displayed="true" />
</android.widget.FrameLayout>
</hierarchy>'''
Extracting attribute
import re
soup = BeautifulSoup(page, 'lxml')
for content_disc_element in soup.findAll(re.compile(r".*"), {"content-desc" : re.compile(r".*")}):
print(content_disc_element['content-desc'])
Output :
This gives us the expected output present for attribute content-desc
:
1
2;Po Lam
3
4
5
6
about
CodePudding user response:
The problem can be solved easier (without regex complexity):
from bs4 import BeautifulSoup
page = '''...'''
soup = BeautifulSoup(page, 'lxml')
elems = soup.find_all()
for x in elems:
if x.has_attr('content-desc'):
print(x['content-desc'])
This will return:
1
2;Po Lam
3
4
5
6
about
8
store;Po Lam
add
open
value
10
11
32
13
14
15
16
17
18
19
20
33
34
35
20
Also, you should avoid using findAll
in newer bs4 versions, and instead use find_all