Paraphrased: “Arduino creates open source hardware and software, specifically focusing on microcontrollers, for educational and prototyping endeavors. Those familiar with me are aware of my passion for exploring Arduinos and Raspberry Pi. While I don’t aspire to be an electrical engineer, I’m primarily drawn to the coding and artistic aspects. However, the tangible experience of constructing something and the gratification of witnessing your code in action are truly invaluable.
Arduino offers a platform called the Arduino Project Hub, where developers can showcase their code and designs. Users can engage through comments and show their appreciation for the projects.”
I’ve always felt a bit hesitant to share my projects on the official Arduino Project Hub, so I prefer leaving the code on GitHub and discussing my favorite projects elsewhere. The projects featured on the Arduino hub appear to be meticulously crafted and receive substantial attention in terms of views and respects (similar to likes). Moreover, they are often authored by experienced creators who have amassed a significant following.
To enhance my web-scraping abilities, I developed a program that extracted data from every project listed on the hub. I then aimed to analyze the collected data in order to identify any patterns that could potentially predict a project’s success. I defined success as the ratio of ‘respects’ received by a project to the total number of views it garnered. Additionally, I retrieved the ‘tags’ associated with each project.
I utilized Scrapy, a user-friendly web scraping framework, to extract data from the entire website. Scrapy proves to be particularly suitable for straightforward websites like this one. Initially, I identified the HTML tags associated with the specific elements I aimed to extract from each project page. These tags are known as xpaths within Scrapy. By instructing Scrapy to retrieve similar xpaths (which might not be identical, hence the usefulness of regular expressions), I could scrape the desired information from every project page. The Chrome developer tools made it effortless to locate the xpaths of the elements I needed, and thanks to the intuitive page-numbering system implemented on the site, I could easily access all project pages in a sequential manner.
Check out the code repo.
As an enjoyable endeavor, I developed a basic linear regression model to explore the potential influence of a creator’s number of followers on the respect-to-view ratio received by their projects. To make the process interactive, I created a Shiny app that guides users through the steps of conducting a multiple linear regression analysis.
- from data exploration,
- to feature selection using variance inflation factor analysis (to help reduce multicolinearity),
- and prediction using the model you just built.
Upon analyzing the data, I discovered a limited correlation between the respect-to-view ratio of a project and the creator’s status, although veteran creators did attract more views. Feel free to explore the Shiny app yourself. While there are other machine learning techniques we could explore beyond regression, based on the data exploration stage, my expectations for finding a significant relationship are not particularly high.
Given these findings, I might reconsider posting some of my projects on the hub after all! It appears that the community is quite receptive to newcomers, as I couldn’t reject the null hypothesis that there is no correlation between a developer’s experience and the respect ratio of their project.