Get text with Xpath-CodePudding

I want to collect GitHub users' monthly contributions from 2004 until now as shown in the picture. And output the monthly data into csv file with corresponding month columns (e.g., 2022_10). The Xpath of these texts is:

#//*[@id="js-contribution-activity"]/div/div/div/div/details/summary/span[1]

This is what my csv file (df1) looks like:

	LinkedIn Website	GitHub Website	user
0	https://www.linkedin.com/in/chad-roberts-b86699/	https://github.com/crobby	crobby
1	https://www.linkedin.com/in/grahamdumpleton/	https://github.com/GrahamDumpleton	GrahamDumpleton

Here is my best try so far:

for index, row in df1.iterrows():
    try:
        user = row['user']
    except:
        pass
    for y in range(2004, 2023):
        for m in range(1, 13):
            try:
                current_url = f'https://github.com/{user}?tab=overview&from={y}-{m}-01&to={y}-{m}-31'
                print(current_url)
                driver.get(current_url)
                time.sleep(0.1)
                contribution = driver.findElement(webdriver.By.xpath("//*[@id='js-contribution-activity']/div/div/div/div/details/summary/span[1]")).getText();
                df1.loc[index, f'{str(y)}_{str(m)}'] = contribution
            except:
                pass

print(df1)
df1.to_csv('C:/Users/fredr/Desktop/output today.csv')

I cannot figure out why there is no output. Thanks for your help.

CodePudding user response：

You need to use WebDriverWait expected_conditions explicit waits.
I see there are multiple contribution fields there, so you need to collect all those elements as a list and then to iterate over the list extracting each element text.
You need to improve your locators, they should be short and clear as possible.
Also you mixed in your code Java and Python. getText() and ; are from Java...
Try this:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)

driver.get(current_url)
contributions = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@id='js-contribution-activity']//summary/span[1]")))
for contribution in contributions:
    print(contribution.text)

CodePudding user response：

I haven't tried it with selenium but with just requests and lxml this xpath expression

//div[@]//details[@]/summary/span[1]

seems to work.