Lifely logo
Lifely logo
Clustering and segmentation.

Let's put things in boxes without feeling bad.

Artificial Intelligence can help you figure out where you should draw the boundaries, allowing you to make decisions on treating those groups equally. Clustering algorithms are a type of unsupervised machine learning that finds meaningful structure and groupings, which you then can use for creating customer segments for marketing purposes, for example.

What does it do?

In the traditional sense of the word, machine learning works by example: using historical data, data about new “cases” is assessed, and a prediction is made that describes the expected outcome with the data of the case. However, for machine learning to work this way, you need a rather large dataset with these historical records for the model to make accurate predictions. What do you do when you don’t have this dataset? There are two options: you either go and see whether pre-trained models might work for your case, or you make use of unsupervised learning techniques.


The technical nitty-gritty.

Unsupervised models are machine learning algorithms that are trained using information that is unlabeled and unclassified. In the training process, the algorithm tries to group the unsorted information according to similarities, patterns and differences without guidance. Clustering is the task of dividing these data points into a number of groups such that data points in the same group are more similar to data points in their group and dissimilar to those outside of their group.

The simplest clustering model is the k-means clustering algorithm. It works as follows: the person performing the clustering defines a target number of clusters in which all the data should fall. Then, the algorithm randomly establishes groups of data and calculates the “centroid” of the group, which is the position you get when taking the mean of all positions in the group. To get the definite clusters, each data point is evaluated to find the nearest centroid and moved to the cluster that is nearest. Then the centroids are recalculated. This is done repeatedly until the centroids have stabilised, meaning that they don’t change their values anymore.

What is our opinion?

Unsupervised methods don’t always give you exactly what you’re looking for, and results might be surprising. That’s why we always make sure to check and double-check the results of unsupervised models, to make sure that outcomes are explainable to a certain degree, and the number of clusters are appropriate or need to be tuned. This is all manual labour though, so make sure to collect data in its cleanest way so that you can save on costs of analysis before starting with unsupervised methods.

That being said, if you’re still establishing your customer segments based on age group or gender, we highly recommend questioning your beliefs by checking out the difference between your human-defined segments and AI-defined segments. In a world where demographics alone don’t tell you the whole story, reconsidering what your target audience is, and where you can spend your money most effectively is always a good idea.

How can you apply it?

Find as much data on your customers as possible, and centralise it. As said before, making sense of the outcome of unsupervised models is sometimes difficult, but not impossible if you have clean data to work with from the get-go. Structuring and organising your data collection works wonders here, and allows you to start immediately – no manual labelling needed.

Talk to an expert

Call us020 846 19 05 Mail

Drop us a message

    This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

    Thank you for reaching out!

    Your message is in good hands. We strive to get back at you within one working day.