Selenium snippets

Useful pieces of code for using Selenium and dealing with common errors

Importing

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait

Starting up a browser

# If you want to open Chrome
driver = webdriver.Chrome()
# If you want to open Firefox
driver = webdriver.Firefox()

But Python can’t find chromedriver

driver_path = '/Users/yourname/Desktop/foundations/chromedriver'
driver = webdriver.Chrome(executable_path=driver_path)

But Python can’t find Chrome/Firefox

options = webdriver.ChromeOptions()
options.binary_location = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe"
driver = webdriver.Chrome(chrome_options=options, executable_path="C:/Utility/BrowserDrivers/chromedriver.exe")
Visiting a page
driver.get('http://www.nytimes.com')

Typing in a form

text_input = driver.find_element_by_id('name_input')
text_input.send_keys('Katherine')

Fill out a dropdown

from selenium.webdriver.support.ui import Select
select = Select(driver.find_element_by_name('cityname'))
select.select_by_visible_text('Houston')
search_button = driver.find_element_by_id('sch_button')
search_button.click()

Scrolling (if you get an error something’s not in view, ElementNotVisibleException)

button = driver.find_element_by_class_name('load-more-btn')
driver.execute_script("arguments[0].scrollIntoView(true)", button)
button.click()

Trying to get something that might not exist

try:
  search_button = driver.find_element_by_id('sch_button')
  search_button.click()
except:
  print("It didn't work")

Getting text and attributes

# Get the text of an element
element.text

# Get the href of a link
element.get_attribute('href')

# Get the HTML inside
element.get_attribute('innerHTML')
 

 

Installing Selenium and ChromeDriver on Windows

Want to use Selenium to scrape with Chrome on Windows? Let’s do it!

We’ll need to install a couple things:

  1. Selenium, which allows you to control browsers from Python
  2. ChromeDriver, which allows software to control Chrome (like Selenium!)

Installing ChromeDriver

STEP ONE: Downloading ChromeDriver

First, download ChromeDriver from its terribly ugly site. It looks like a scam or like it was put together by a 12 year old, but I promise it’s good and cool and nice.

You’ll want chromedriver_win32.zip. That link should download 2.40, but if you want something more recent just go to the page and download the right thing.

STEP TWO: Unzipping ChromeDriver

Extract chromedriver_win32.zip and it will give you a file called chromedriver.exe. This is the magic software!

STEP THREE: Moving ChromeDriver somewhere sensible

Now we need to move ChromeDriver somewhere that Python and Selenium will be able to find it (a.k.a. in your PATH).

The easiest place to put it is in C:\Windows. So move it there!

If you can’t move chromedriver there, you can always just tell Python where it is when you’re loading it up. See Selenium snippetsunder “But Python can’t find chromedriver”

Installing Selenium

If you google about Selenium, a lot of the time you see things about “Selenium server” and blah blah blah - you don’t need that, you aren’t running a huge complex of automated browser testing machines. You don’t need that. We just need plain ol’ Selenium.

Let’s use pip3 to install Selenium for Python 3.

pip install selenium

Installing Chrome

Oh, you also need to make sure you have Chrome (or Firefox) installed and it lives in one of the normal places applications do.

If Python can’t find Chrome/Firefox, you can always just tell Python where it is when you’re loading it up. See Selenium snippetsunder “But Python can’t find Chrome/Firefox”

Test it

Want to make sure it works? Run the following to pull all of the headlines from the New York Times homepage.

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.nytimes.com")
headlines = driver.find_elements_by_class_name("story-heading")
for headline in headlines:
    print(headline.text.strip())


Logo Lycée Paul Sérusier

J'enseigne au
Lycée Paul SERUSIER
Avenue de Waldkappel
29270 CARHAIX PLOUGUER
Tél : 02 98 99 29 29
Site : www.lycee-serusier.fr

footer2

Richard GAUTHIER
Professeur de Physique Appliquée
Certification ISN
Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.

www.carhaix2020.bzh