rhondamuse.com

Exploring Today's Most Popular Programming Languages on GitHub

Written on

Chapter 1: Introduction to GitHub Language Data

In this section, we'll delve into how data regarding programming languages used in current GitHub projects can be obtained and what insights it can provide.

This chart showcases the findings! 🥰 Let me share the methodology behind the data collection, what the results indicate, and, more crucially, what they may overlook.

Section 1.1: Data Acquisition Process

So, how did I gather this information? 🤔 GitHub offers a powerful GraphQL API that allows access to a wealth of information concerning both public and private repositories. While some details necessitate additional authorization, a registered user can accomplish a lot with just an API key.

My query designed to fetch language data from public repositories appears as follows (certain specifics are omitted for clarity):

query find_repositories($n: Int!, $cursor: String) {

search(query: "created:${createDate} size:${startSize}..${endSize}", type: REPOSITORY, first:$n, after:$cursor) {

edges {

node {

__typename

...on Repository {

primaryLanguage {}

languages (first:100) {}

repositoryTopics (first: 100) {}

}

}

cursor

}

}

}

If you're eager to explore the API's capabilities, visit their API Explorer and start experimenting.

Important Considerations ⚠️

When extracting information from GitHub's API, certain limitations must be navigated.

Firstly, there's pagination. It's not possible to retrieve more than 100 entries at once for repositories or languages. Fortunately, each object includes a cursor that must be saved and used in subsequent queries (after:$cursor) to access the next set of objects.

Secondly, a limit of 1,000 on search queries exists. Regardless of pagination, only 1,000 items can be accessed per search, necessitating the division of searches based on one or more parameters.

I utilize size:${startSize}..${endSize}" to segment my search by repository sizes, creating multiple queries through a list of size pairs:

const sizes = [

[50, 55],

[55, 60],

...

[100000, 1000000000]

];

sizes.forEach(([startSize, endSize], index) => {

const query = createSizeQuery(date, startSize, endSize - 1);

delay(10000 * index)

.then(() => searchQuery(octokit, query, undefined, [])

.then((allEdges) => {...})

});

Incorporating a delay is essential to prevent surpassing API limits.

Section 1.2: Data Insights

What does the chart reveal? 📊 I'm extracting data from all repositories exceeding 50KB created on February 16, 2024 (yesterday as I write this). This primarily reflects hobby projects from developers engaged in late-night coding sessions and updates to established repositories.

Two factors influenced my choice of this filter:

  1. Simplicity
  2. A desire to understand the programming languages selected for new projects, rather than merely those most frequently used.

Subsection 1.2.1: Language Considerations

I've excluded certain "languages" such as Dockerfiles and Shell scripts but opted to retain HTML and CSS. I encourage viewers to filter the information independently, and I was surprised by the prevalence of HTML and CSS.

What about data volume? 🐸

Beyond counting language occurrences, I produced a chart depicting the total byte counts.

A look at the average byte count per repository for specific languages shows Prolog leading significantly with over 3GB, necessitating its removal for the subsequent chart.

Chapter 2: Conclusion and Observations

The first video titled "The World's Most Popular Programming Language" explores the current leading languages, providing further context to our findings.

Today, we examined statistics on programming languages across all public GitHub repositories created on February 16, 2024. The results were surprising, with PHP appearing on the list while Rust was absent.

Interestingly, Rust ranks in the top 20 as a primary language but does not appear when sorted by frequency. For those interested, I created a final chart focusing solely on primary languages.

The second video, "Top 4 Programming Languages To Learn for 2024 | Become a Web Developer," offers insights into the most valuable programming languages to learn this year.

Thank you for joining me on this exploration! Have a wonderful day, and see you next time!

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Starlink's Impact on Radio Astronomy: A Looming Concern

Megaconstellations like Starlink may obstruct astronomical observations, raising concerns about the future of radio astronomy.

Hope Through Shadow Work: A Fresh Perspective on Self-Help

Exploring the transformative potential of shadow work amidst the complexities of the self-help industry.

Understanding Sudden Relationship Breaks: Insights and Reflections

Explore the reasons behind abrupt relationship endings and their emotional impact, along with insights for personal growth.

Space Exploration: Worth the Investment or Just a Costly Venture?

Exploring the benefits of space travel and its impact on technology and society.

Discover My 2022 Programming and Technology Articles

Explore quick links to all my tech articles from 2022, aimed at helping developers enhance their skills.

A Journey Through the Midwit Trap: Embracing Intuition and Reality

An exploration of the midwit trap, the impact of societal narratives, and a call to reconnect with our intuition.

Exciting Developments in Apple's Tech World: What to Expect Next

Explore the latest Apple news and rumors, including insights on watchOS and Mac updates.

The Hidden Health Threat That Could Shorten Your Life

Stress is a major health risk that can lead to various diseases, affecting both physical and mental well-being.