Selenium Web Driver Beginner Guide

BuddhiK
9 min readDec 22, 2021

Guide 01

Introduction

History

Let me start with an interesting story about naming this product kit by the name of ‘selenium’.

If anyone is quite curious about this topic you can dig into the internet and will be able to find this same story with different versions.

In 2004 Jason Huggins, an engineer working on a web project, realized that he has to repeat the same manual test cases more frequently. So to make this efficient he wrote a JavaScript program to automate the browser actions. He named it “JavaScript test runner” and later made open-sourced this test runner and renamed it “Selenium Core”.

While the selenium development going on there was another popular test framework QTP made by a competitive company “Mercury Interactive”. Later QTP was taken by HP.So some say Jason Huggings crack a joke in their daily meeting and some say he mentioned in their group email, that the“Selenium will cure the mercury poisoning. Because the Selenium is an antidote in the treatment of mercury intoxication. To give competition to Mercury Interactive so the team took the name “Selenium”.

What is Selenium?

Selenium is an automated testing framework. It’s free and open-source. We can use it within different platforms like Windows, Linux, Mac OS, Solaris, and over multiple browsers like Chrome, Firefox, InternetExplorer, Opera, etc.

And we can use multiple client libraries to bind with selenium. (Java, Python, C#, Rubby.)

Selenium Product Kit :

Under the selenium product, there are

  • Selenium IDE: Selenium IDE(Integrated Development Environment) is primarily a record/run tool that a test case developer uses to develop selenium test cases.
  • Selenium RC: Selenium Remote Controller is now not in use. It was a testing framework that enables a QA or a developer to write test cases in any programming language in order to automate UI tests for web applications against any HTTP website. By improving SRC then the Selenium web driver was created.
  • Selenium Web driver: Selenium Web driver is an open-source collection of APIs which is used for testing web applications. The Selenium Web driver tool is used for automating web application testing to verify that it works as expected or not.
  • Selenium Grid: Selenium Grid is used with the SRC to perform parallel runs of multiple test cases within the same time in different browsers and machines.

Download link: https://www.selenium.dev

Selenium Web Driver

Selenium web driver is an open-source collection of APIs as mentioned above. It facilitates the user to automate any of the modern browsers.

Selenium web driver Architecture

As mentioned in the diagram above, Java, JavaScript, Python, C#, and Rubi have Selenium support client libraries. Selenium client libraries are different kinds of jars. The client libraries(jars) contain nothing but classes and methods of selenium web drivers which use to write automated test scripts.

ex: If you are writing your test script in java, you need to install selenium client libraries.

Selenium Installation

All the support language binding libraries are available on Selenium's official site where you can download and install.

Your automation test script can use the selenium client libraries and bind with the selenium. Here selenium client libraries act as an API to execute the selenium commands.

Using browser drivers which are supporting the real browsers we can interact and automate the browser activities of the real browsers.

ex:

  • Geko driver — Firefox
  • Chrome driver
  • Chrome Browser
  • Internet Explora driver
  • Internet Explora browser
  • etc.

If you are making a java maven project then the selenium bindings can be imported to the project via pom.xml by adding dependencies as below.

https://mvnrepository.com/

In addition, you have to add the relevant dependencies to the pom.xml which browsers you are planning to run the test automation.

ex: Firefox

What is JSON Wire protocol?

JSON stands for JavaScript Object Notation. Its uses as a data exchange format among the client and servers on the web which supports any data format available in languages Java, C#, Python, Ruby, etc.

When we look furthermore at the JSON wire protocol it is using REST API to transfer the data between HTTP servers. (Each web driver browser has its own HTTP server)

Then the browser will handle the rest of the part as same as a simple client-server architecture. That means the HTTP connection will establish between the client and the server. And will be able to communicate(http request/response) with XML/JSON.

Web drivers API are available in Selenium

  • FirefoxDriver.
  • InternetExplorerDriver.
  • ChromeDriver.
  • SafariDriver.
  • OperaDriver.
  • AndroidDriver.
  • IPhoneDriver.
  • HTMLUnitDriver.

Languages support Selenium web driver

Java, Python, Ruby, C#, JavaScript, Perl, and PHP.

Selenium web driver Locating Strategies

To automate your test script first you should know the basics of locating the web elements in a web page.

Inspect an element of a web page

For example, just consider the login page of a web application.

Consider a simple test scenario for this page.

example :

  1. The user enters the user name.
  2. The user enters the password.
  3. User press login button.
  4. Then the user sees the welcome page.

So to automate the page given above we should know how to locate the user name text input field, password field, and then the login button in the web page for a basic level of automation script.

To inspect the element

shortcut :

Short cut for chrome (windows) — Ctrl + shift + c

Long way :

  1. First press F12 to see the developer tool where you can see the internal structure of the page.

ex:

when you press F12 you will see the developer tools

2. Right-click on the element you need to inspect. And select inspect.

3. Decide the best way to locate the element by using Locator selecting options.

4. Optional: You can test whether your actually locating the selector in the console with xpath, css selector.

ex: inspecting an input field with id = “id_fullname”

With the console, you can be sure about the accuracy before you add the element locating a piece of code segments into the program.

This is a bit difficult and complex work sometimes. So we can use addons and tools to pic the element. Most industrial practices to using addons and tools to pic the element.

Element Locating Extensions/Addons

FireFox Browser

ex: Firebug, FirePath, Xpath Finder for firefox

firefox browser addon — firebug

Chrome browser

01. XPath Finder

02.SelectorsHub

04.RexPath

05. Firebug Lite for Chrome

Support to inspect HTML elements and compute CSS styles.

06.TruePath Extension

the true path extension generates the relative Xpath .

Selenium Locators

Selenium locators use to identify the Elements such as text fields, buttons, checkboxes, radio buttons, etc. This should be done more carefully and with onhand experiences, otherwise, you will end up with a faulty test script.

Sometimes it is easy to locate the elements with their id, name, class name, tag name, and text name since they are very unique for the element.

ex :

css selector

what is css?

css stands for cascading style sheets. It is a language that use in html pages to easily style the webpages. Basically, css explains the web elements to how they should be displayed within the webpage.

css selectors types

Xpath : XML path Language

XPath is used to navigate through the elements and attributes in an XML structure of an XML/HTML document. Xpath is normally used when an element cannot locate by id, name, class, etc.

Xpath is just like we use the file structures in our computers. For example, my lecture notes are located in E:\Zoom\Maths\Lecture_01.pdf

This means my maths Lecture_01.pdf is located in: Go to E drive inside it there is a folder call Zoom. Inside this Zoom folder, there is another folder called Maths. Inside the Maths folder, there is my Lecture_01.pdf

Likewise, we can navigate through the XML/HTML document/page as mentioned below.

Example: consider the search text box by the name and id =’q’ here. Let's inspect the element and see what will be the XPath is.

Here my relative xpath is //input[@id=’q’]

means navigate to where input id =’ q's. This is applicable and suits when only there is one element id=’q’ is exists. If there are many of them then it will return a list of elements that fulfill the requirement.

Again consider that I have many files called Lecture_01.pdf on my PC.So If I search for Lecture_01.pdf on my computer then all the results will show. But if I have only one unique file then only one will display as the result.

Aabsolute path is as below.

/html[1]/body[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/div[2]/form[1]/div[1]/div[1]/input[1]

This describe the whole structure/path to navigate to the input field called q.

Absolute Xpath Vs. Relative Xpath

Absolute Xpath :

Lets take the above example.

/html[1]/body[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/div[2]/form[1]/div[1]/div[1]/input[1]

Here the path is described from the very beginning. Here HTML page in body, in divs likewise. It very clearly navigates to the exact element. Absolute Xpath starts with syntax /.

Disadvantage: Think that if we change any div due to some reason in the design. Then the whole Xpath will fail to locate the exact element.

Relative Xpath :

When we consider the above example relative path is //input[@id=’q’]. It explains to find an input field in which id is equal to q.

Normally relative path can be written by starting from anywhere from the DOM structure.Syntax starts from // .The relative path is the most effective one compared with the absolute.

Will discuss more and dig in deep further with the coming up next story !

--

--

Responses (1)