TestYourBond scrapper with python
Here we are going to make a web scrapper for Testyourbond , It's a website that provides quiz link to share with people and each one complete the quiz gets score out of 15.
Web scrapping is s a technique employed to extract data from websites . we are going to use this to get the correct answers of the quiz before doing it.so let's get started.
Making the quiz
Here, how to create a quiz to get a link at the end . that's going to be our input to the script.1-
2-
3-
4-
Now we have our link here and now we ready to code.
Setting up the environment
We are going to use python 3 with requests(it's used to get and post http requests to get data) and beautifulsoup(used for parsing the html and format it in data structured way) modules. you can download python 3 from here after this you can get the modules by typing this commends in terminalpip install requests pip install beautifulsoup4
Now we are ready to code.
Importing modules
import requests from bs4 import BeautifulSoup
Getting the URL
We will ask the user for input which is our generated link from the website , so we can make a get request to download the html document of the page.page = requests.get(input("Enter the url of the quiz :"))
Parsing the html
Here comes the beautifulsoup turn to parse the html and formatting it. we are making a beautifulsoup object from the html document so we can use it's powerful methods.soup=BeautifulSoup(page.content,"html.parser")
Getting all the questions
We need to find a unique tag , class or id to search for and collect all the question text from it so by inspecting elements in browser and scrolling down we can find this block.<div id="W05" class="question hidden unanswered"> <h3 class="fivepxtop">If TheBiggestNoob meet a genie, what would be TheBiggestNoob's wish?</h3> <table class="pure-table pure-table-horizontal"> <tr> <td value='a' class="answer incorrect">1 Million Dollar</td> </tr> <tr> <td value='b' class="answer incorrect">Beautiful Wife/Handsome Husband</td> </tr> <tr> <td value='c' class="answer correct">To be the PM of Country</td> </tr> <tr> <td value='d' class="answer incorrect">3 More Wishes</td> </tr> </table> </div>
We can find a class "question hidden unanswered" that contain the question with h3 tag and class "fivepxtop". We can't search for "fivepxtop" class because there is others elements using it.
So we are going to search for h3 tag inside the class of "question hidden unanswered" .
There is two methods in beautifulsoup find() that return first occurrence of a beautifulsoup object with attributes i search for and find_all() which return a list instead. so we are going to use find_all() to grap all the questions.
We will first search "question hidden unanswered" elements.
questions=soup.find_all(class_="question hidden unanswered")
Then searching in each element in the list for h3 tag and extract the text and storing it in a list.
question_text=[] for item in questions: question_text.append(item.find('h3').get_text())
Now we have collected all the questions. let's search for the answers.
Getting all the correct answers
We will do the same we did before this timer it's easier because when we search in the html document we can find that there is a unique class the contain the correct answer.<tr> <td value='c' class="answer correct">To be the PM of Country</td> </tr>
As we can see class "answer correct" which we will search for and extract the text and storing it in a list.
answers=soup.find_all(class_="answer correct") answer_text=[] for item in answers: answer_text.append(item.get_text())
Making the output
displaying the output in a easy way for our user.for i in range(0,15): print(str(i+1)+"-Q: "+question_text[i]+" A: "+answer_text[i]+"\n")
We have finished our script, you can get it from here . and it's ready to test.
Comments
Post a Comment