It work if Xpath using contains function
response.xpath('//table[contains(@class, "wikitable sortable")]')
However it returns a empty using code below:
response.xpath('//table[@]')
Any explanation about why it return an empty list?
For more information, I'm trying to extract territory rankings table from this site https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population as practice.
CodePudding user response:
You can extract territory rankings table easily using only pandas as follows:
Code:
import pandas as pd
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population',attrs={'class':'wikitable sortable'})
df = dfs[0]#.to_csv('d.csv')
print(df)
Output:
Rank State or territory ... % of the total U.S. pop.[d] % of Elec. Coll.
'20 '10 State or territory ... 2010 Ch.2010–2020 % of Elec. Coll.
0 1.0 1.0 California ... 11.91% –0.11%
10.04%
1 2.0 2.0 Texas ... 8.04% 0.66%
7.43%
2 3.0 4.0 Florida ... 6.01% 0.42%
5.58%
3 4.0 3.0 New York ... 6.19% –0.17%
5.20%
4 5.0 6.0 Pennsylvania ... 4.06% –0.18%
3.53%
5 6.0 5.0 Illinois ... 4.10% –0.28%
3.53%
6 7.0 7.0 Ohio ... 3.69% –0.17%
3.16%
7 8.0 9.0 Georgia ... 3.10% 0.10%
2.97%
8 9.0 10.0 North Carolina ... 3.05% 0.07%
2.97%
9 10.0 8.0 Michigan ... 3.16% –0.15%
2.79%
10 11.0 11.0 New Jersey ... 2.81% –0.04%
2.60%
11 12.0 12.0 Virginia ... 2.56% 0.02%
2.42%
12 13.0 13.0 Washington ... 2.15% 0.15%
2.23%
13 14.0 16.0 Arizona ... 2.04% 0.09%
2.04%
14 15.0 14.0 Massachusetts ... 2.09% 0.00%
2.04%
15 16.0 17.0 Tennessee ... 2.03% 0.03%
2.04%
16 17.0 15.0 Indiana ... 2.07% –0.05%
2.04%
17 18.0 19.0 Maryland ... 1.85% –0.00%
1.86%
18 19.0 18.0 Missouri ... 1.91% –0.08%
1.86%
19 20.0 20.0 Wisconsin ... 1.82% –0.06%
1.86%
20 21.0 22.0 Colorado ... 1.61% 0.12%
1.86%
21 22.0 21.0 Minnesota ... 1.70% 0.01%
1.86%
22 23.0 24.0 South Carolina ... 1.48% 0.05%
1.67%
23 24.0 23.0 Alabama ... 1.53% –0.03%
1.67%
24 25.0 25.0 Louisiana ... 1.45% –0.06%
1.49%
25 26.0 26.0 Kentucky ... 1.39% –0.04%
1.49%
26 27.0 27.0 Oregon ... 1.22% 0.04%
1.49%
27 28.0 28.0 Oklahoma ... 1.20% –0.02%
1.30%
28 29.0 30.0 Connecticut ... 1.14% –0.07%
1.30%
29 30.0 29.0 Puerto Rico ... 1.19% –0.21%
—
30 31.0 35.0 Utah ... 0.88% 0.09%
1.12%
31 32.0 31.0 Iowa ... 0.97% –0.02%
1.12%
32 33.0 36.0 Nevada ... 0.86% 0.06%
1.12%
33 34.0 33.0 Arkansas ... 0.93% –0.03%
1.12%
34 35.0 32.0 Mississippi ... 0.95% –0.06%
1.12%
35 36.0 34.0 Kansas ... 0.91% –0.04%
1.12%
36 37.0 37.0 New Mexico ... 0.66% –0.03%
0.93%
37 38.0 39.0 Nebraska ... 0.58% 0.00%
0.93%
38 39.0 40.0 Idaho ... 0.50% 0.05%
0.74%
39 40.0 38.0 West Virginia ... 0.59% –0.06%
0.74%
40 41.0 41.0 Hawaii ... 0.43% 0.00%
0.74%
41 42.0 43.0 New Hampshire ... 0.42% –0.01%
0.74%
42 43.0 42.0 Maine ... 0.42% –0.02%
0.74%
43 44.0 44.0 Rhode Island ... 0.34% –0.01%
0.74%
44 45.0 45.0 Montana ... 0.32% 0.01%
0.74%
45 46.0 46.0 Delaware ... 0.29% 0.01%
0.56%
46 47.0 47.0 South Dakota ... 0.26% 0.00%
0.56%
47 48.0 49.0 North Dakota ... 0.21% 0.02%
0.56%
48 49.0 48.0 Alaska ... 0.23% –0.01%
0.56%
49 50.0 51.0 District of Columbia ... 0.19% 0.01% 0.56%
50 51.0 50.0 Vermont ... 0.20% –0.01% 0.56%
51 52.0 52.0 Wyoming ... 0.18% –0.01% 0.56%
52 53.0 53.0 Guam[8] ... 0.05% –0.00% —
53 54.0 54.0 U.S. Virgin Islands[9] ... 0.03% –0.00% —
54 55.0 55.0 American Samoa[10] ... 0.02% –0.00% —
55 56.0 56.0 Northern Mariana Islands[11] ... 0.02% –0.00% —
56 NaN NaN Contiguous United States ... 98.03% 0.23% 98.70%
57 NaN NaN The fifty states ... 98.50% 0.21% 99.44%
58 NaN NaN The fifty states and D.C. ... 98.69% 0.22% 100.00%
59 NaN NaN Total United States ... — — —
[60 rows x 16 columns]