So in today's post I thought I'd talk about one of my projects and what's cool is that it relates to my blog posts on TheNibbleByte's website!!!😃
It's a program which scrapes my views off the internet and then uses the data to generate a graph showing a trend line of the views. This will be useful for me as I continue to post more and more posts on this website!
I should point out that I'm only scraping from this website and not Medium because I haven't built my audience on there yet.
I should also point out that this solution is by no means perfect. I'm sure that I could've made it more complex, more efficient and added more features however this is just something I've quickly made that does the job (heuristic solution😉).
In this post I will be showing the code (full source code available on my GitHub)
In my opinion, this post isn't really for Python beginners (although it's still worth reading!!). Nevertheless, enjoy!
Selenium is an open sourced web-automation tool and the webdriver is what we use to access the page and 'scrape' data.
Time was something I used during development to pause the execution for 10 seconds, just so I knew how long to wait for.
NumPy is a linear algebra library which I used to cleanse the data and convert it from a List to Array.
Pandas is what I used to convert the Array(s) into a DataFrame.
SeaBorn is a Data Visualisation library in which I fed the DataFrame in, as a parameter to create the LinePlot.
Getting the Data:
The first 2 lines are where the chrome WebDriver is used and that's what we use to scrape the web on Chrome: https://chromedriver.chromium.org/downloads .
The 2nd line just creates an instance of the WebDriver, in our Python code which allows us to access the website later.
The next line 'gets' the webpage that contains all my blogposts. By running this line on its own it would just open the webpage and do nothing.
Now I find all the span-tags with the class name "M1M61". This might sound quite random however I found this by using 'inspect element' , looking for the number of views and viewing the HTML: