Semi-Supervised Learning Explained (With Examples)

From movie recommendations to chatbots as customer service reps, it seems like machine learning (ML) is absolutely everywhere. But one thing you may not realize is just how much data is required to train these advanced systems, and how much time and energy goes into formatting that data appropriately.

Machine learning engineers have developed many ways of trying to cut down on this bottleneck, and one of the techniques that have emerged from these efforts is semi-supervised learning.

Today, we’re going to discuss semi-supervised learning, how it works, and where it’s being applied.

What is Semi-Supervised Learning?

Semi-supervised learning (SSL) is an approach to machine learning (ML) that is appropriate for tasks where you have a large amount of data that you want to learn from, only a fraction of which is labeled.

Semi-supervised learning sits somewhere between supervised and unsupervised learning, and we’ll start by understanding these techniques because that will make it easier to grasp how semi-supervised learning works.

Supervised learning refers to any ML setup in which a model learns from labeled data. It’s called “supervised” because the model is effectively being trained by showing it many examples of the right answer.

Suppose you’re trying to build a neural network that can take a picture of different plant species and classify them. If you give it a picture of a rose it’ll output the “rose” label, if you give it a fern it’ll output the “fern” label, and so on.

The way to start training such a network is to assemble many labeled images of each kind of plant you’re interested in. You’ll need dozens or hundreds of such images, and they’ll each need to be labeled by a human.

Then, you’ll assemble these into a dataset and train your model on it. What the neural network will do is learn some kind of function that maps features in the image (the concentrations of different colors, say, or the shape of the stems and leaves) to a label (“rose”, “fern”.)

One drawback to this approach is that it can be slow and extremely expensive, both in funds and in time. You could probably put together a labeled dataset of a few hundred plant images in a weekend, but what if you’re training something more complex, where the stakes are higher? A model trained to spot breast cancer from a scan will need thousands of images, perhaps tens of thousands. And not just anyone can identify a cancerous lump, you’ll need a skilled human to look at the scan to label it “cancerous” and “non-cancerous.”

Unsupervised learning, by contrast, requires no such labeled data. Instead, an unsupervised machine learning algorithm is able to ingest data, analyze its underlying structure, and categorize data points according to this learned structure.

Semi-supervised learning

Okay, so what does this mean? A fairly common unsupervised learning task is clustering a corpus of documents thematically, and let’s say you want to do this with a bunch of different national anthems (hey, we’re not going to judge you for how you like to spend your afternoons!).

A good, basic algorithm for a task like this is the k-means algorithm, so-called because it will sort documents into k categories. K-means begins by randomly initializing k “centroids” (which you can think of as essentially being the center value for a given category), then moving these centroids around in an attempt to reduce the distance between the centroids and the values in the clusters.

This process will often involve a lot of fiddling. Since you don’t actually know the optimal number of clusters (remember that this is an unsupervised task), you might have to try several different values of k before you get results that are sensible.

To sort our national anthems into clusters you’ll have to first pre-process the text in various ways, then you’ll run it through the k-means clustering algorithm. Once that is done, you can start examining the clusters for themes. You might find that one cluster features words like “beauty”, “heart” and “mother”, another features words like “free” and “fight”, another features words like “guard” and “honor”, etc.

As with supervised learning, unsupervised learning has drawbacks. With a clustering task like the one just described, it might take a lot of work and multiple false starts to find a value of k that gives good results. And it’s not always obvious what the clusters actually mean. Sometimes there will be clear features that distinguish one cluster from another, but other times they won’t correspond to anything that’s easily interpretable from a human perspective.

Semi-supervised learning, by contrast, combines elements of both of these approaches. You start by training a model on the subset of your data that is labeled, then apply it to the larger unlabeled part of your data. In theory, this should simultaneously give you a powerful predictive model that is able to generalize to data it hasn’t seen before while saving you from the toil of creating thousands of your own labels.

How Does Semi-Supervised Learning Work?

We’ve covered a lot of ground, so let’s review. Two of the most common forms of machine learning are supervised learning and unsupervised learning. The former tends to require a lot of labeled data to produce a useful model, while the latter can soak up a lot of hours in tinkering and yield clusters that are hard to understand. By training a model on a labeled subset of data and then applying it to the unlabeled data, you can save yourself tremendous amounts of effort.

But what’s actually happening under the hood?

Three main variants of semi-supervised learning are self-training, co-training, and graph-based label propagation, and we’ll discuss each of these in turn.

Self-training

Self-training is the simplest kind of semi-supervised learning, and it works like this.

A small subset of your data will have labels while the rest won’t have any, so you’ll begin by using supervised learning to train a model on the labeled data. With this model, you’ll go over the unlabeled data to generate pseudo-labels, so-called because they are machine-generated and not human-generated.

Now, you have a new dataset; a fraction of it has human-generated labels while the rest contains machine-generated pseudo-labels, but all the data points now have some kind of label and a model can be trained on them.

Co-training

Co-training has the same basic flavor as self-training, but it has more moving parts. With co-training you’re going to train two models on the labeled data, each on a different set of features (in the literature these are called “views”.)

If we’re still working on that plant classifier from before, one model might be trained on the number of leaves or petals, while another might be trained on their color.

At any rate, now you have a pair of models trained on different views of the labeled data. These models will then generate pseudo-labels for all the unlabeled datasets. When one of the models is very confident in its pseudo-label (i.e., when the probability it assigns to its prediction is very high), that pseudo-label will be used to update the prediction of the other model, and vice versa.

Let’s say both models come to an image of a rose. The first model thinks it’s a rose with 95% probability, while the other thinks it’s a tulip with a 68% probability. Since the first model seems really sure of itself, its label is used to change the label on the other model.

Think of it like studying a complex subject with a friend. Sometimes a given topic will make more sense to you, and you’ll have to explain it to your friend. Other times they’ll have a better handle on it, and you’ll have to learn from them.

In the end, you’ll both have made each other stronger, and you’ll get more done together than you would’ve done alone. Co-training attempts to utilize the same basic dynamic with ML models.

Graph-based semi-supervised learning

Another way to apply labels to unlabeled data is by utilizing a graph data structure. A graph is a set of nodes (in graph theory we call them “vertices”) which are linked together through “edges.” The cities on a map would be vertices, and the highways linking them would be edges.

If you put your labeled and unlabeled data on a graph, you can propagate the labels throughout by counting the number of pathways from a given unlabeled node to the labeled nodes.

Imagine that we’ve got our fern and rose images in a graph, together with a bunch of other unlabeled plant images. We can choose one of those unlabeled nodes and count up how many ways we can reach all the “rose” nodes and all the “fern” nodes. If there are more paths to a rose node than a fern node, we classify the unlabeled node as a “rose”, and vice versa. This gives us a powerful alternative means by which to algorithmically generate labels for unlabeled data.

Contact Us

Semi-Supervised Learning Examples

The amount of data in the world is increasing at a staggering rate, while the number of human-hours available for labeling it all is increasing at a much less impressive clip. This presents a problem because there’s no end to the places where we want to apply machine learning.

Semi-supervised learning presents a possible solution to this dilemma, and in the next few sections, we’ll describe semi-supervised learning examples in real life.

  • Identifying cases of fraud: In finance, semi-supervised learning can be used to train systems for identifying cases of fraud or extortion. Rather than hand-labeling thousands of individual instances, engineers can start with a few labeled examples and proceed with one of the semi-supervised learning approaches described above.
  • Classifying content on the web: The internet is a big place, and new websites are put up all the time. In order to serve useful search results it’s necessary to classify huge amounts of this web content, which can be done with semi-supervised learning.
  • Analyzing audio and images: This is perhaps the most popular use of semi-supervised learning. When audio files or image files are generated they’re often not labeled, which makes it difficult to use them for machine learning. Beginning with a small subset of human-labeled data, however, this problem can be overcome.

How Is Semi-Supervised Learning Different From…?

With all the different approaches to machine learning, it can be easy to confuse them. To make sure you fully understand semi-supervised learning, let’s take a moment to distinguish it from similar techniques.

Semi-Supervised Learning vs Self-Supervised Learning

With semi-supervised learning you’re training a model on a subset of labeled data and then using this model to process the unlabeled data. Self-supervised learning is different in that it’s showing an algorithm some fraction of the data (say the first 80 words in a paragraph) and then having it predict the remainder (the other 20 words in a paragraph.)

Self-supervised learning is how LLMs like GPT-4 are trained.

Semi-Supervised Learning vs Reinforcement Learning

One interesting subcategory of ML we haven’t discussed yet is reinforcement learning (RL). RL involves leveraging the mathematics of sequential decision theory (usually a Markov Decision Process) to train an agent to interact with its environment in a dynamic, open-ended way.

It bears little resemblance to semi-supervised learning, and the two should not be confused.

Semi-Supervised Learning vs Active Learning

Active learning is a type of semi-supervised learning. The big difference is that, with active learning, the algorithm will send its lowest-confidence pseudo-labels to a human for correction.

When Should You Use Semi-Supervised Learning?

Semi-supervised learning is a way of training ML models when you only have a small amount of labeled data. By training the model on just the labeled subset of data and using it in a clever way to label the rest, you can avoid the difficulty of having a human being label everything.

There are many situations in which semi-supervised learning can help you make use of more of your data. That’s why it has found widespread use in domains as diverse as document classification, fraud, and image identification.

So long as you’re considering ways of using advanced AI systems to take your business to the next level, check out our generative AI resource hub to go even deeper. This technology is changing everything, and if you don’t want to be left behind, set up a time to talk with us.

Request A Demo

Quiq Is Honored To Be A 2023 Bronze Stevie® Winner

Quiq is proud to announce that it has been honored as a 2023 Bronze Stevie® winner in the Best Technical Support Solution – Computer Services category!

The Stevie Awards for Technical Innovation and Technology Industry recognizes organizations that demonstrate excellence in technology innovation, product development, and technical support services.

At Quiq, our team is committed to bringing solutions to market that empower our clients to improve their customer care, service, and experience operations. As a team, we pride ourselves on rapid innovation that delivers business-changing results for the world’s best brands.

Winning the 2023 Bronze Stevie is a humbling recognition of the work our team loves doing every day. Thank you to the American Business Awards® for the honor!

About the Stevies

The American Business Awards stand out as the most prestigious business awards program in the United States.

Known as the Stevies, a nod to the Greek term “crowned,” these awards acknowledge outstanding achievement in business for organizations and individuals across over 60 countries.

If you’re interested in learning more about The American Business Awards or all the 2023 Stevie winners, check out their website.

How Large Language Models Have Evolved

It seems as though large language models (LLMs) exploded into public awareness almost overnight. Relatively few people had heard of GPT-2, but I would venture to guess relatively few people haven’t heard of ChatGPT.

But like most things, language models have a history. And, in addition to being outrageously interesting, that history can help us reason about the progress in LLMs, as well as their likely future impacts.

Let’s get started!

A Brief History of Artificial Intelligence Development

The human fascination with building artificial beings capable of thought and action goes back a long way. Writing in roughly the 8th century BCE, Homer recounts tales of the god Hephaestus outsourcing repetitive manual tasks to automated bellows and working alongside robot-like “attendants” that were “…golden, and in appearance like living young women.”

No mere adornments, these handmaidens were described as having “intelligence in their hearts” and stirring “nimbly in support of their master” because “from the immortal gods they have learned how to do things.”

Some 500 years later, mathematicians in Alexandria would produce treatises on creating mechanical servants and various kinds of automata. Heron wrote a technical manual for producing a mechanical shrine and an automated theatre whose figurines could be activated to stage a full tragic play through an intricate system of cords and axles.

Nor is it only ancient Greece that tells similar tales. Jewish legends speak of the Golem, a being made of clay and imbued with life and agency through the use of language. The word “abracadabra”, in fact, comes from the Aramaic phrase avra k’davra, which translates to “I create as I speak.”

Through the ages, these old ideas have found new expression in stories such as “The Sorcerer’s Apprentice”, Mary Shelley’s “Frankenstein”, and Karel Čapek’s “R.U.R.”, a science fiction play that features the first recorded use of the word “robot”.

From Science Fiction to Science Fact

But they remained purely fiction until the early 20th Century, when advances in the theory of computation, as well as the development of primitive computers, began to offer a path toward actually building intelligent systems.

Arguably, the field of artificial intelligence really began in earnest with the 1950 publication of Alan Turing’s “Computing Machinery and Intelligence” – in which he proposed the famous “Turing test” – and with the 1956 Dartmouth conference on AI, organized by luminaries John McCarthy and Marvin Minsky.

People began taking AI seriously. Over the next ~50 years, there were numerous periods of hype and exuberance in which major advances were made, as well as long stretches, known as “AI winters”, in which funding dried up and little was accomplished.

Neural networks and the deep learning revolution are two advances that are particularly important for understanding how large language models have evolved over time, so it’s to these that we now turn.

Neural Networks And The Deep Learning Revolution

The groundwork for future LLM systems was laid by Walter Pitts and Warren McCulloch in the early 1940s. Inspired by the burgeoning study of the human brain, they wondered if it would be possible to build an artificial neuron that had the same basic properties as a biological one, i.e. it would activate and fire once a certain critical threshold had been crossed.

They were successful, though several other breakthroughs would be required before artificial neurons could be arranged into systems that were capable of doing useful work. One such breakthrough was backpropagation, the basic algorithm that is still used to train deep learning systems. Backpropagation was developed in 1960, and it uses the errors in a model’s outputs to iteratively adjust its internal parameters.

It wasn’t until 1985, however, that David Rumelhart, Ronald Williams, and Geoff Hinton used backpropagation in neural networks, and in 1989, this allowed Yann LeCun to train a convolutional neural network to recognize handwritten digits.

This was not the only architectural improvement that came out of this period. Especially noteworthy were the long short-term memory (LSTM) networks that were introduced in 1997 by Sepp Hochreiter and Jürgen Schmidhuber, which made it possible to learn more complex functions.

With these advances, it was clear that neural networks could be trained to do useful work, and that they were poised to do so. All that was left was to gather the missing piece: data.

The Big Data Era

Neural networks and deep-learning applications tend to be extremely data-hungry, and access to quality training data has always been a major bottleneck. In 2009 Stanford’s Fei-Fei Li sought to change this by releasing Imagenet, a database of over 14 million labeled images that could be used for free by researchers. The increase in available data, together with substantial improvements in computer hardware like graphical processing units (GPUs), meant that at long last the promise of deep learning could begin to be fulfilled.

And it was. In 2011 a convolutional neural network called “AlexNet” won multiple international competitions for image recognition, IBM’s Watson system beat several Jeopardy! all-stars in a real game, and Apple launched Siri. Amazon’s Alexa followed in 2014, and from 2015 to 2017 DeepMind’s AlphaGo shocked the world by utterly dominating the best human Go players.

Substantial strides were made in language models. In 2018 Google introduced its Bidirectional Encoder Representations from Transformers (BERT), a pre-trained model capable of a wide array of tasks, like text summarization, translation, and sentiment analysis.

One Model To Rule Them All

It would be easy to miss the significance of AlexNet’s performance on the ImageNet competition or BERT’s usefulness across multiple tasks. For a long time, it was anyone’s guess as to whether it would be possible to train a single large model on a dataset and use it for a range of purposes, or whether it would be necessary to train a multitude of models for each application.

From 2011 onwards, it has become clear that large, general-purpose models are often the best way to go. This point has only become more reinforced, with the success of GPT-4 in everything from brainstorming scientific hypotheses to handling customer service tasks.

Contact Us

How Has Large Language Model Performance Improved?

Now that we’ve discussed this history, we’re well-placed to understand why LLMs and generative AI have ignited so much controversy. People have been mulling over the promise (and peril) of thinking machines for literally thousands of years. After all that time it looks like they might be here, at long last.

But what, exactly, has people so excited? What is it that advanced AI tools are doing that has captured the popular imagination? In the following sections, we’ll talk about the astonishing (and astonishingly rapid) improvements that have been seen in language models in just a few short years.

Getting To Human-Level

One of the more surprising things about LLMs such as ChatGPT is just how good they are at so many different things. LLMs are trained with a technique known as “self-supervised learning”. They take random samples of the text data they’re given, and they try to predict what words come next given the words that came before.

Suppose the model sees the famous opening lines of Leo Tolstoy’s Ann Karenina: “Happy families are all alike; unhappy families are all unhappy in their own way.” What the model is trying to do is learn a function that will allow it to predict “in their own way” from “Happy families are all alike; unhappy families are all unhappy ___”.

The modern crop of LLMs can do this incredibly well, but what is remarkable is just how far this gets you. People are using generative AI to help them write poems, business plans, and code, create recipes based on the ingredients in their fridges, and answer customer questions.

Emergence in Language Models

Perhaps even more interesting, however, is the phenomenon of “emergence” in language models. When researchers tested LLMs on a wide variety of tasks meant to be especially challenging to these models – things like identifying a movie given a string of emojis or finding legal chess moves – they found that in about 5% of tasks, there is a sudden, sharp increase in ability on a given task once a model reaches a certain size.

At present, it’s not really clear how we should think about emergence. One hypothesis for emergence is that a big enough model is able to learn some general piece of knowledge not attainable by a smaller cousin, while another, more prosaic one is that it’s a relatively straightforward consequence of the model’s internal statistical machinery.

What’s more, it’s difficult to pin down the conditions required for emergence in language models. Though it generally appears to be a function of model size, there are cases in which the same abilities can be achieved with smaller models, or with models trained on very high-quality data, and emergence shows up at different scales for different models and tasks.

Whatever ends up being the case, it’s clear that this is a promising direction for future research. Much more work needs to be done to understand how precisely LLMs accomplish what they accomplish. This will not only redound upon the question of emergence, it will also inform the ongoing efforts to make language models safer and less biased.

The GPT Series

The big recent news in AI has, of course, been ChatGPT. ChatGPT has proven useful in an astonishingly-wide variety of use cases and is among the first powerful systems to have been made widely available to the public.

ChatGPT is part of a broader series of GPT models built by OpenAI. “GPT” stands for “generative pre-trained transformer”, and the first of its kind was developed back in 2018. New models and major updates have been released at a rapid clip ever since, culminating with GPT-4 coming out in March of 2023.

At present, OpenAI’s CEO Sam Altman has claimed that there are no current plans to train a successor GPT-5 model, but there are other companies, like DeepMind, who could plausibly build a competitor.

What’s Next For Large Language Models?

Given their flexibility and power, LLMs are finding use across a wide variety of industries, from software engineering to medicine to customer service.

If your interest has been piqued and you’d like to talk to an expert at Quiq about incorporating it into your business, reach out to us to schedule a demo!

Request A Demo

Before You Develop a Mobile App For Your Business—Read This

Remember when every business was coming out with an app? Your favorite clothing brand, that big retail chain, your neighborhood grocery store, and even your babysitter jumped on the bandwagon and claimed real estate on their customers’ mobile devices.

It probably made you think: Do we need an app for our business?

Despite the many benefits of an app, diving headfirst into development can drain your team’s time and resources without the guarantee of a return. Done poorly, it can even hinder your customer experience. Before you do any mobile app development, you need a plan.

This article will take you through some of the lessons learned from working with brands that deliver world-class experiences within apps and beyond.

Why do companies build apps?

Apps are powerful marketing tools for all kinds of businesses—and none more than e-commerce. Here are some of the top reasons why businesses build an app.

A place for loyal customers.

Almost by default, a mobile app is an exclusive space for your loyal customers. Think about the last time you downloaded an app. It probably wasn’t for a business you buy from once a year. It’s almost always a brand you follow closely or a service you use frequently.

Providing an app is basically like creating a direct line of communication with your best customers. You can create exclusive content, provide a better shopping experience, and unlock early access to products and services. Apps are great ways to turn good customers into great ones.

Mobile device real estate.

On average, Americans check their phones 344 times per day—or once every 4 minutes. And 88% of the time we spend on our phones is spent in apps, according to Business Insider. Having your brand logo as an icon on your customers’ home screens is invaluable real estate.

Push notifications.

When customers have push notifications turned on, it’s another way to speak directly to your customers. Push notifications are great engagement tools, and you can connect with customers using timely and personalized communications and ultimately drive in-app sales.

Beating out or keeping up with competitors.

Standing out from the competition is another reason many businesses build apps. If your competitors are using apps to stand out from the crowd, then it often compels businesses to do the same.

Contact Us

What are the drawbacks of using building an app?

While mobile apps are still extremely popular, they have some major drawbacks for brands not ready to invest in them.

Phones are overcrowded.

Whereas building an app five years ago meant you stood out from the crowd, now you’re just one of many. People have an average of 80 apps on their phones, but they’re only using around nine a day.

Basically, that means mobile users are downloading apps and not using them on a regular basis. In fact, 25% of apps are used once and then never opened again, according to Statista.

Having an app doesn’t guarantee your customers’ attention or engagement—that’s still up to your marketing team.

There’s a big upfront investment.

Whether you enlist the help of your development team or outsource app creation, it’s a big lift. Getting a mobile app up and running takes significant resources, and while there may be a return on investment, it isn’t guaranteed.

When you’re already overwhelmed with your current development efforts, adding another microsite to manage could just make it worse.

You’ll double your marketing efforts.

More push notifications, more campaigns, more content. An app just means you have to do more to see an increase in revenue. While it could be a valuable asset, there are other, smaller steps you can take that will help you see the same revenue boost without the exponential effort.

Can you deliver rich customer experiences without an app?

Yes! But don’t think we’re anti-app. In fact, a lot of our clients create great apps that are sticky because they provide ongoing value to their customers. These clients are able to reach a whole set of people in their moment of need and build trust as they continue to look to the app for help.

However, many of the marketing and customer service goals that drive businesses to create an app can be achieved through rich business messaging. Here are a few examples.

Want to speak directly to your customers? Try outbound SMS.

Push notifications are extremely effective at connecting with customers, but it only takes a few taps to turn them off.

A similar communication method is outbound SMS messaging. You can personalize messages and deliver real-time communications via text messaging. Plus, with rich messaging capabilities, you can send interactive media like images, cards, emojis, and videos to enhance every conversation.

Want to engage with your customers? Use Google Business Messages.

Get customers from Google directly in communication with your customer service agents using Google Business Messages.

Customers can tap a message button right from Google search to connect with your team. (And since 92% of searches start with Google, there’s a good chance your customers will take advantage of this feature.)

Want to enhance your customer experience? Use Apple Messages for Business.

If you’re after a branded experience and want to meet user expectations, Apple Messages for Business delivers. Apple device users can simply tap the message icon from Maps, Siri, Safari, Spotlight, or your company’s website and instantly connect with your team.

You’ll deliver a rich messaging experience, plus your branding upfront and center. Your company name, logo, and colors will be featured in the messaging app, delivering a fully branded experience for your customers.

Want to be more social? Connect Quiq with social platforms.

Clients using Quiq are uniquely equipped with a conversational engagement platform that provides rich experiences to users across chat and business messaging channels.

This means that companies can provide content-rich, personalized experiences across SMS/text business messaging, web chat, Facebook, Twitter, Instagram, and WhatsApp.

Your brand can be on social platforms without working across them. Quiq gives your team access to all these messaging channels within one easy-to-use message center. So, unlike an app, adding more channels doesn’t necessarily increase the workload. It just gives your customers more ways to connect with you.

Should you consider business messaging over an app?

There’s no either/or choice here. Both can be part of a thriving marketing and customer service strategy. But if you’re looking for a way to engage your customers and haven’t tried business messaging—start there.

If you’re on the fence, consider this:

  1. You don’t have to build an app—you only have to implement business messaging.
  2. Customers don’t have to download and learn anything to connect with you. Business messaging is right there in communication channels they already know and love, like texting and social media.

Engage customers with or without an app.

The main goal of most apps is to help build long-term relationships with customers. Whether you choose to build an app or not, business messaging supports this goal by providing information, support, and help at the customer’s exact moment of need.

Quiq powers conversations between customers and companies across the most convenient and preferred engagement channels. With Quiq, you’ll have meaningful, timely, and personalized conversations with your customers that can be easily managed in a simplified UI.

Ready to see how business messaging can help you engage your customers with or without an app? Request a demo or try it for yourself today.