Exploring Today's Most Popular Programming Languages on GitHub
Written on
Chapter 1: Introduction to GitHub Language Data
In this section, we'll delve into how data regarding programming languages used in current GitHub projects can be obtained and what insights it can provide.
This chart showcases the findings! 🥰 Let me share the methodology behind the data collection, what the results indicate, and, more crucially, what they may overlook.
Section 1.1: Data Acquisition Process
So, how did I gather this information? 🤔 GitHub offers a powerful GraphQL API that allows access to a wealth of information concerning both public and private repositories. While some details necessitate additional authorization, a registered user can accomplish a lot with just an API key.
My query designed to fetch language data from public repositories appears as follows (certain specifics are omitted for clarity):
query find_repositories($n: Int!, $cursor: String) {
search(query: "created:${createDate} size:${startSize}..${endSize}", type: REPOSITORY, first:$n, after:$cursor) {
edges {
node {
__typename
...on Repository {
primaryLanguage {}
languages (first:100) {}
repositoryTopics (first: 100) {}
}
}
cursor
}
}
}
If you're eager to explore the API's capabilities, visit their API Explorer and start experimenting.
Important Considerations ⚠️
When extracting information from GitHub's API, certain limitations must be navigated.
Firstly, there's pagination. It's not possible to retrieve more than 100 entries at once for repositories or languages. Fortunately, each object includes a cursor that must be saved and used in subsequent queries (after:$cursor) to access the next set of objects.
Secondly, a limit of 1,000 on search queries exists. Regardless of pagination, only 1,000 items can be accessed per search, necessitating the division of searches based on one or more parameters.
I utilize size:${startSize}..${endSize}" to segment my search by repository sizes, creating multiple queries through a list of size pairs:
const sizes = [
[50, 55],
[55, 60],
...
[100000, 1000000000]
];
sizes.forEach(([startSize, endSize], index) => {
const query = createSizeQuery(date, startSize, endSize - 1);
delay(10000 * index)
.then(() => searchQuery(octokit, query, undefined, [])
.then((allEdges) => {...})
});
Incorporating a delay is essential to prevent surpassing API limits.
Section 1.2: Data Insights
What does the chart reveal? 📊 I'm extracting data from all repositories exceeding 50KB created on February 16, 2024 (yesterday as I write this). This primarily reflects hobby projects from developers engaged in late-night coding sessions and updates to established repositories.
Two factors influenced my choice of this filter:
- Simplicity
- A desire to understand the programming languages selected for new projects, rather than merely those most frequently used.
Subsection 1.2.1: Language Considerations
I've excluded certain "languages" such as Dockerfiles and Shell scripts but opted to retain HTML and CSS. I encourage viewers to filter the information independently, and I was surprised by the prevalence of HTML and CSS.
What about data volume? 🐸
Beyond counting language occurrences, I produced a chart depicting the total byte counts.
A look at the average byte count per repository for specific languages shows Prolog leading significantly with over 3GB, necessitating its removal for the subsequent chart.
Chapter 2: Conclusion and Observations
The first video titled "The World's Most Popular Programming Language" explores the current leading languages, providing further context to our findings.
Today, we examined statistics on programming languages across all public GitHub repositories created on February 16, 2024. The results were surprising, with PHP appearing on the list while Rust was absent.
Interestingly, Rust ranks in the top 20 as a primary language but does not appear when sorted by frequency. For those interested, I created a final chart focusing solely on primary languages.
The second video, "Top 4 Programming Languages To Learn for 2024 | Become a Web Developer," offers insights into the most valuable programming languages to learn this year.
Thank you for joining me on this exploration! Have a wonderful day, and see you next time!