Lifely logo
Lifely logo
Automated machine learning.

Automate your processes streamline the work of a data scientist.

Automated machine learning is an umbrella name for tools, libraries, and toolkits that automate parts of the (previously human) process of creating machine learning pipelines and applying the technology to real-world use cases. This includes tools to automatically preprocess and clean data, select features, and optimise model parameters. This kind of utility or service makes a data scientist more productive and we welcome it as a form of automated machine learning.

What does it do?

At Lifely, we follow an adjusted CRISP-DM cycle for data science projects. This means that a data scientist starts by understanding the business logic they are operating in, in order to then understand the data they are looking at. Then, there’s the data preparation step. This is an inherently human process: it involves manually inspecting data, spotting irregularities, concatenating observations, throwing out outliers, and a lot more – especially when they are dealing with highly unstructured sources like text or images. Normally, this is a lot of trial and error. The same counts for the next step: creating AI models that fulfil the wishes of customers. This step is often a combination of intuition and brute-forcing: a data scientist often uses models that were successful in other projects on similar problems, and then just try out a lot of different variations to find a model that ticks the boxes. In essence, it is a problem with finite options and a clear outcome for every option: a prime example where automation can come in handy.

 

The technical nitty-gritty.

The biggest advancements lately have been made in this modelling domain. A toolkit that we love called auto-sklearn works roughly as follows: it pre-defines a certain amount of possible classifiers and their hyperparameters, and efficiently navigates the space of possible models and configurations to quickly discover what works well for a specific modelling task. This package specifically uses models that are available in scikit-learn, one of the most popular data science tools.

Because of the limited options for configurations using the models and hyperparameters, the tool then performs something called a Bayesian Optimization algorithm to retrieve the best results. Instead of just trying out every single possibility, this optimization algorithm takes the results of the previous rounds of training and chooses the next hyperparameters to evaluate based on previous results. This means it skips options that are statistically expected to have bad performance, saving you time weighing the options.

What is our opinion?

While there is a myriad of initiatives for automating steps beyond just modelling, we have the highest trust in automating the data modelling step. Primarily, we think that working with large amounts of data – especially when it concerns personal data – comes with a sense of responsibility. The models we create are meant to be used outside of lab environments, in the big, beautiful world. That means that they should not discriminate, enlarge differences, or make wrong judgments that impact real-life situations: they should not contain any bias whatsoever. Automating beyond just modelling would mean that we remove human evaluation from the AI pipeline. Eyes that spot subtle hints of bias and correct it in a way that balances performance with doing good. That’s why we use machines in places that it is suitable, but we will always continue having second opinions – especially in the data preprocessing and model evaluation stages.

How can you apply it?

From a human-centric point of view, we would always recommend starting off with a “regular” data science workflow. The manual stuff, boring and all, allows you to understand the process of picking models the right way, forming opinions on why models do or do not work, and aligning the structure of your data with proper models. The path of least resistance won’t help you understand your automatically picked model once they make a wrong turn down the road. But we do always welcome some nice ways to make our lives better. Once you have the basics down, try experimenting with some fun tools that improve your efficiency. But keep your eyes out for bias. Try and make the models that are automatically picked (somewhat) explainable, so that you spot the errors before you get an awkward phone call once they are deployed.

Talk to an expert

Call us020 846 19 05 Mail usinfo@lifely.nl

Drop us a message

    Thank you for reaching out!

    Your message is in good hands. We strive to get back at you within one working day.