rhondamuse.com

Exploring Today's Most Popular Programming Languages on GitHub

Written on

Chapter 1: Introduction to GitHub Language Data

In this section, we'll delve into how data regarding programming languages used in current GitHub projects can be obtained and what insights it can provide.

This chart showcases the findings! 🥰 Let me share the methodology behind the data collection, what the results indicate, and, more crucially, what they may overlook.

Section 1.1: Data Acquisition Process

So, how did I gather this information? 🤔 GitHub offers a powerful GraphQL API that allows access to a wealth of information concerning both public and private repositories. While some details necessitate additional authorization, a registered user can accomplish a lot with just an API key.

My query designed to fetch language data from public repositories appears as follows (certain specifics are omitted for clarity):

query find_repositories($n: Int!, $cursor: String) {

search(query: "created:${createDate} size:${startSize}..${endSize}", type: REPOSITORY, first:$n, after:$cursor) {

edges {

node {

__typename

...on Repository {

primaryLanguage {}

languages (first:100) {}

repositoryTopics (first: 100) {}

}

}

cursor

}

}

}

If you're eager to explore the API's capabilities, visit their API Explorer and start experimenting.

Important Considerations ⚠️

When extracting information from GitHub's API, certain limitations must be navigated.

Firstly, there's pagination. It's not possible to retrieve more than 100 entries at once for repositories or languages. Fortunately, each object includes a cursor that must be saved and used in subsequent queries (after:$cursor) to access the next set of objects.

Secondly, a limit of 1,000 on search queries exists. Regardless of pagination, only 1,000 items can be accessed per search, necessitating the division of searches based on one or more parameters.

I utilize size:${startSize}..${endSize}" to segment my search by repository sizes, creating multiple queries through a list of size pairs:

const sizes = [

[50, 55],

[55, 60],

...

[100000, 1000000000]

];

sizes.forEach(([startSize, endSize], index) => {

const query = createSizeQuery(date, startSize, endSize - 1);

delay(10000 * index)

.then(() => searchQuery(octokit, query, undefined, [])

.then((allEdges) => {...})

});

Incorporating a delay is essential to prevent surpassing API limits.

Section 1.2: Data Insights

What does the chart reveal? 📊 I'm extracting data from all repositories exceeding 50KB created on February 16, 2024 (yesterday as I write this). This primarily reflects hobby projects from developers engaged in late-night coding sessions and updates to established repositories.

Two factors influenced my choice of this filter:

  1. Simplicity
  2. A desire to understand the programming languages selected for new projects, rather than merely those most frequently used.

Subsection 1.2.1: Language Considerations

I've excluded certain "languages" such as Dockerfiles and Shell scripts but opted to retain HTML and CSS. I encourage viewers to filter the information independently, and I was surprised by the prevalence of HTML and CSS.

What about data volume? 🐸

Beyond counting language occurrences, I produced a chart depicting the total byte counts.

A look at the average byte count per repository for specific languages shows Prolog leading significantly with over 3GB, necessitating its removal for the subsequent chart.

Chapter 2: Conclusion and Observations

The first video titled "The World's Most Popular Programming Language" explores the current leading languages, providing further context to our findings.

Today, we examined statistics on programming languages across all public GitHub repositories created on February 16, 2024. The results were surprising, with PHP appearing on the list while Rust was absent.

Interestingly, Rust ranks in the top 20 as a primary language but does not appear when sorted by frequency. For those interested, I created a final chart focusing solely on primary languages.

The second video, "Top 4 Programming Languages To Learn for 2024 | Become a Web Developer," offers insights into the most valuable programming languages to learn this year.

Thank you for joining me on this exploration! Have a wonderful day, and see you next time!

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Insights on Cultivating an Entrepreneurial Mindset

Discover essential insights on developing an entrepreneurial mindset based on expert advice and personal experiences.

Understanding Gradient Boosting Techniques in Machine Learning

An overview of gradient boosting techniques in supervised learning, focusing on its applications in regression and classification tasks.

The Impact of Climate on Aggression: A Look at Modern Liberalism

Exploring how climate influences aggression and societal structures, along with a critique of modern liberalism and its tribal tendencies.

Empowering Women Entrepreneurs: Join the Intentional Success Tribe

Discover the Intentional Success Tribe and their mission to empower women entrepreneurs through collaboration and community.

# Unraveling the Enigmatic Door Carvings of Dover Castle

Discover the mysterious carvings on Dover Castle's doors, revealing intriguing insights into the lives of soldiers from centuries past.

Meta's Latest VR Prototypes Aim for Hyper-Realistic Experiences

Meta unveils new VR prototypes targeting lifelike visuals, with insights from Zuckerberg about future developments in VR technology.

The Transformative Power of Markets: Cultivating Kindness

Exploring the connection between market dynamics and altruistic behavior, revealing how community engagement fosters kindness.

Mastering Web Scraping with Python: Top Libraries You Need

Discover essential Python libraries for web scraping and data mining while ensuring ethical practices.