According to a StackOverflow Survey, Here’s How Beginners Learn New Coding Languages

How do people learn to program? What languages do they use? What databases do they use alongside their apps? One of the largest surveys, the StackOverflow survey, luckily provides their (anonymized) data, so we can dive even deeper into the data.

Because the survey has already been capably analyzed by StackOverflow, we won’t be reviewing most of the top level results here. You can read their analysis here. Here are some of the key results:

  • Representation improved for some underrepresented groups, but not all. There were very slightly more female respondents, but not more non-binary, genderqueer, or non-conforming respondents.
  • Python fell to third on the most-loved languages. Typescript took second place, and Rust remained at number one.
  • 90% of users visit StackOverflow to solve a problem, and only 0.3% had never visited StackOverflow. This indicates both the popularity of StackOverflow and the fact that the survey mostly reached StackOverflow users.

In this article we’ll focus on learning methods, the languages chosen by new developers, and their demographics.

But first, a few caveats

Although StackOverflow is widely used by a number of developers, those users differ from software developers as a whole, potentially skewing the results of a survey that draws on that audience. In terms of the actual survey, we know the respondents are more likely to be white and more likely to be male than developers as a whole.

The survey has a few strengths, including that it was answered by 65,000 people from all over the world and covered a wide range of subjects.

Finally, the 2020 data was also conducted in February 2020, in other words, “before COVID-19 was declared a pandemic by the World Health Organization and before the virus impacted every country in the world.” (The 2021 survey hasn’t been released yet).

Programming languages

For beginners and non-beginners alike, the top languages are HTML/CSS, JavaScript, Python, SQL, and Java.

While the top 5 is same for both, a noticeable shift occurs in shell scripting languages, which move up 2.5 places from beginners to non-beginners. This is probably because people pick them up as practical necessity, rather than setting out to use them when they begin a project or decide to learn to program.

C# and Go probably owe their greater popularity among non-beginners in that they target enterprises, so people pick them up in order to get a job. That being said, the survey doesn’t actually ask people for their motivation, so this is just an informed guess.

Does this mean you should learn these languages to get an edge over other beginners?

Not necessarily. C# and Go, while having some unique ideas, have enough in common with other languages that this is probably not necessary. That being said, if you happen to know that the industry or company you want to work in uses one or both of these languages, they’re perfectly fine first (or second) programming languages.

Obviously, popularity doesn’t mean a language is good or that you should learn it, but it does give you some advantages—more popular languages have more learning materials and more libraries. And if a language is popular among companies, that means more jobs in that language (although also more competition for those same jobs).

For more on choosing a programming language, you can read How to Choose a Programming Language, which has more on this topic, also from a beginner’s perspective.

Technologies

Similar to how scripting languages are more common among more experienced users, deployment tools like Ansible and Terraform are also more common among non-beginners.

It’s not obvious why Unreal Engine would be markedly less popular among non-beginners. Because of its use at studios, it may be that many students pick it up with aspirations for working for large studios, but end up getting a job at smaller studios that use Unity or more niche engines or in other areas of programming.

There’s some support for this explanation. People who use Unreal Engine are more likely to be students and hobbyists:

Databases

Databases are more stable between beginners and non-beginners.

One significant exception is Redis, which is more common among non-beginners. Redis’ higher ranking among non-beginners may be because it is often used to improve performance on high-traffic servers.

Like with C# and Go, the relative lack of Redis use among beginners might make you wonder if learning it gives you an edge. While you can also pick it up on the job, it’s also easier to learn than C# or Go, so it may be a better tradeoff.

Learning methods

Almost everyone has used multiple methods to learn, including many beginners. The median number of methods beginners used was two, and the median number non-beginners used was four. Unfortunately, learning methods weren’t a part of the 2020 survey, so I used 2019 data for this part.

The most common methods are self-taught (81 percent), online (56), Computer Science degree programs (53), open source (39), and on the job (37). Despite their hype, only 14 percent of respondents learned from a bootcamp.

The rankings for beginners are a little different: 50 percent online, self-taught (43), or Computer Science degree programs (19). Self-taught is less common among beginners, which makes sense—if you’re early on as a developer, it’s harder to teach yourself. That being said, if you haven’t really done much self-learning, expect that you will sometimes down the road.

The number of learning methods is less surprising when you consider that many developers learn new technologies on a regular basis—74.1 percent learn a new technology at least annually. (The frequency of learning new technology was asked about in 2020 but not 2019).

Demographic Limits of the Survey

In 2020, they made a special effort. This doesn’t seem to have a huge increase in underrepresented groups, unfortunately. Black, Latinx, and Indigeneous respondents remain underrepresented in the survey even when compared to the Computer and Mathematics sector as a whole, which skews more white and male than the nation as a whole.

Looking at U.S. professional respondents, only about 8.8 percent were women, compared to 17 percent found by the census in software development jobs. (And of course, roughly half of the entire population is female.)

Similarly, Black, American Indian, Native Hawaiian, and multiracial respondents were underrepresented. Detailed information for software developers isn’t available from the Census, but we can look at the parent category, computer and mathematical industries.

Several groups are underrepresented:

  • Black respondents represented 2.6 percent, compared to 7.4 percent of the labor force.
  • Asian respondents represented 7.8 in the survey, compared to 14.6 percent of computer and math professions as a whole.
  • Indigenous respondents represented 0.1 of professional respondents, compared to 0.3 percent of workers in computer and math professions.
RaceWorkforce %Comp & Math%*Professionals in Survey %Notes
Asian6.214.69.8Combined South Asian, Southeast Asian, and East Asian categories in the study
Black11.97.41.6
Indigenous0.70.30.1





Combined Native Hawaiian and Native American categories from Census
Latinx17.88.43.6
Multiracial2.72.94.8Combined biracial and multiracial categories from study, plus anyone who marked multiple racial categories
Other5.01.8n/aSurvey had no other category.
White61.660.280.8White Alone, Not Hispanic Or Latino

* Numbers don’t add up to 100 percent, likely due to rounding or the imprecision of the survey

Other groups have a higher representation than when looking at just professionals, but that’s in comparison to a larger share of the population.

For example, when comparing Black professional respondents to the Black computer & math workforce, there’s a gap of 5.8 percentage points. Black respondents made up 2.0 percent of all U.S. respondents—but that’s compared to 13.4 of the population, a gap of 11.4 percent.

RacePopulation %Survey %Notes
Asian5.99.0Combined South Asian, Southeast Asian, and East Asian categories in the study
Black13.42.0
Indigenous1.50.1Combined Native Hawaiian and Native American categories from Census
Latinx18.53.4
Multiracial2.86.7Combined biracial and multiracial categories from study, plus anyone who marked multiple racial categories
White60.178.3

It’s worth noting that, “gender non-conforming” is often thought of as a matter of gender expression, rather than a gender label itself, although of course the two are closely related.

To its credit, the survey does allow multiple responses in both the gender and race respondents, allowing people to more accurately record their gender and race.

How this article was made

The code for this post (some 800 or so lines of it, including comments) was written in R. To reformat and filter data, I used reshape2, dplyr, and tidyr. ggplot2 was used to make the charts and ggrepel was used to avoid overlapping labels in the programming language by experience chart.

One strength of this survey is that in many cases it allowed respondents to choose all that apply, meaning they don’t have to try to determine which of several applicable options is the “right” answer. However, this richness is also trickier to analyze.

If respondents can pick multiple choices from 20 programming languages, you really have not one question with 20 responses, but 20 yes-no questions. (And 21 if you want to include the number of choices selected as a variable.)

We can think of this translation from one column to many as a series of steps:

  1. Separate the combined answers so rather than being a single unit of text (like “HTML/CSS;JavaScript”), each language is a separate item in a list.
  2. Make those list elements separate columns. Replace any missing elements (which R identifies as NA) with FALSE.
  3. Combine the separated columns with the original dataset.

Here’s what that preprocessing step looks like in R:

add_spread <- function(df, column) {
  gender_columns <- df %>% select(c(Respondent, column)) %>% separate_rows(column, sep=";") %>% mutate(val=TRUE) %>%
    pivot_wider(names_from=column, values_from=val, values_fill = list(val = FALSE), names_prefix = paste(column, "_", sep=""))
  
  all_names <- colnames(gender_columns)
    
  colnames(gender_columns) <- all_names
  
  merged <- merge(df, gender_columns, by="Respondent")
  
  counts <- apply(merged[startsWith(colnames(merged), paste0(column, "_"))], FUN=sum, MARGIN=1)
  
  merged <- cbind(merged, counts)
  
  colnames(merged)[ncol(merged)] <- paste0("Count_", column)
  
  merged
}

If you want to learn how to do this kind of analysis, consider hiring an R tutor to help.

Takeaways

Besides giving you a sense of what languages, databases, and technologies other new programmers are using, let’s recap some of the other results:

  • Programmers use a variety of learning methods – one is usually insufficient by itself. Experienced programmers typically use three separate methods, with self-learning being the most common.
  • Programmers need to keep learning as they go along.
  • Deployment technologies and shell scripting languages tend to be used by more experienced developers.
  • While learning a language popular with enterprises may give you a leg up, it’s probably not necessary.
  • The survey probably represents a whiter and more male group than developers as a whole.

Latest Posts

How Much Does Tutoring Cost?

There are lots of options for families and individuals seeking expert help, including online tutoring platforms and brick-and-mortar tutoring centers. The average cost of tutoring…
Read More

Scroll to Top