TestYourBond scrapper with python

November 23, 2017

Here we are going to make a web scrapper for Testyourbond , It's a website that provides quiz link to share with people and each one complete the quiz gets score out of 15.
Web scrapping is s a technique employed to extract data from websites . we are going to use this to get the correct answers of the quiz before doing it.so let's get started.

Making the quiz

Here, how to create a quiz to get a link at the end . that's going to be our input to the script.
1-

Now we have our link here and now we ready to code.

Setting up the environment

We are going to use python 3 with requests(it's used to get and post http requests to get data) and beautifulsoup(used for parsing the html and format it in data structured way) modules. you can download python 3 from here after this you can get the modules by typing this commends in terminal

pip install requests
pip install beautifulsoup4

Now we are ready to code.

Importing modules

import requests
from bs4 import BeautifulSoup

Getting the URL

We will ask the user for input which is our generated link from the website , so we can make a get request to download the html document of the page.

page = requests.get(input("Enter the url of the quiz :"))

Parsing the html

Here comes the beautifulsoup turn to parse the html and formatting it. we are making a beautifulsoup object from the html document so we can use it's powerful methods.

soup=BeautifulSoup(page.content,"html.parser")

Getting all the questions

We need to find a unique tag , class or id to search for and collect all the question text from it so by inspecting elements in browser and scrolling down we can find this block.

<div id="W05" class="question hidden unanswered">
<h3 class="fivepxtop">If TheBiggestNoob meet a genie, what would be TheBiggestNoob&#039;s wish?</h3>
<table class="pure-table pure-table-horizontal">
<tr>
<td value='a' class="answer incorrect">1 Million Dollar</td>
</tr>
<tr>
<td value='b' class="answer incorrect">Beautiful Wife/Handsome Husband</td>
</tr>
<tr>
<td value='c' class="answer correct">To be the PM of Country</td>
</tr>
<tr>
 <td value='d' class="answer incorrect">3 More Wishes</td>
</tr>
</table>
</div>

We can find a class "question hidden unanswered" that contain the question with h3 tag and class "fivepxtop". We can't search for "fivepxtop" class because there is others elements using it.
So we are going to search for h3 tag inside the class of "question hidden unanswered" .

There is two methods in beautifulsoup find() that return first occurrence of a beautifulsoup object with attributes i search for and find_all() which return a list instead. so we are going to use find_all() to grap all the questions.

We will first search "question hidden unanswered" elements.

questions=soup.find_all(class_="question hidden unanswered")

Then searching in each element in the list for h3 tag and extract the text and storing it in a list.

question_text=[]
for item in questions:
 question_text.append(item.find('h3').get_text())

Now we have collected all the questions. let's search for the answers.

Getting all the correct answers

We will do the same we did before this timer it's easier because when we search in the html document we can find that there is a unique class the contain the correct answer.

<tr>
<td value='c' class="answer correct">To be the PM of Country</td>
</tr>

As we can see class "answer correct" which we will search for and extract the text and storing it in a list.

answers=soup.find_all(class_="answer correct")
answer_text=[]
for item in answers:
 answer_text.append(item.get_text())

Making the output

displaying the output in a easy way for our user.

for i in range(0,15):
 print(str(i+1)+"-Q: "+question_text[i]+" A: "+answer_text[i]+"\n")

We have finished our script, you can get it from here . and it's ready to test.

Search This Blog

Ryuodan(The Geeky Dreamer)