Submitted by zslotyi on Mon, 07/06/2020 - 04:33

Depending on where you are coming from, picking your first Machine Learning project can be anything on the spectrum between a no-brainer and challenge that is pretty tough to solve in itself. Considerations can also vary on a broad range, so it is difficult, if not impossible to come up with a one-size-fits all answer. 

But before you deep dive in your first Machine Learning adventure, it is well worth going through a checklist to steer clear of the common mistakes of newbies - at least the ones that are relatively easy to foresee. 

One of the best ways to do that is to ask yourself a few basic questions - and then be brutally honest with the answers you give yourself.

What is the pain point you are trying to solve?

This question is not, by any means exclusive to machine learning - by far not. Identifying the problem, and doing so in an appropriately nuanced manner, is a huge, and often overlooked aspect of any given project.

If you arrived to machine learning in search for an answer to an existing (and potentially real-life) problem, then you will need a very different approach compared to those who are searching for a project - any project, really - to try their newly acquired skills they have after completing one of the many excellent courses on Coursera or its likes.

Our advice here? - Aim low!

Starting from the basics, the complexity of machine learning applications is likely to explode as they approach real-life, not to mention interesting problems. It’s probably easy to see the vast difference between a cat recognition software (the “Hello World!” of machine learning projects), and the applications that are able to provide a medical diagnosis or drive an autonomous vehicle, but it can be more difficult to see where a certain in-between application stands on the scale between the two extremes.

If you are unsure, whether your project is too complex to start out with as your first - it probably is. If you arrived to machine learning in search for a solution to a problem nobody solved before you - chances are that the solution to the problem is a rather complex one, too.

If you want your first project to be successful (and finite), try simplifying your original problem, even if it means that the real-life usefulness of the application will suffer.

You are just starting out, so be gentle to yourself. Later projects will bear the fruits of starting out slow - or at least at a realistic pace.

What skills and resources do you have on your side?

Do you have much time on your hands? Is it supposed to be a side project, or a full-fledged career change? Have you completed a 20-hour crash course or have you just earned a University degree? Do you have someone to turn to for advice, or are you browsing through Stack Overflow thread after Stack Overflow thread for an answer to your basic questions?

These are all questions that tell volumes about what your first machine project should look like. If you are jump-starting a new career (or a career, full stop), if you are a student with not much to do between classes other than scrolling your endless social media feeds, then sure, pick a problem that is complex enough to change the world for the better. And that is likely to eat up years before you come up with any meaningful solution.

If your approach to machine learning is more pragmatic, and you just want to try a few things before you move on to other, interesting (and maybe more complex) problems to solve, then again - the simpler the problem, the better it is to start out with.

Our advice here? - Don’t shy away from problems others have successfully solved before!

This is your first project. You are not looking for a world shattering, revolutionary solution at this stage of your machine learning career. Picking a problem with existing solutions (preferably with solutions you can easily access) is a great way to ensure that the likelihood of getting stuck remains relatively low. It’s not that you want to copy paste other people’s code - God forbid! But looking at solutions by people with more experience, especially doing so after you worked out and committed your own solution, can be hugely educational. Not to mention the safety net it can provide when you start out in a field completely alien to you.

What data do you have?

No matter how brilliant a model you come up with to solve your initial problem - in order to have it working you will eventually need the appropriate data to train it somehow. And if you start to think about the data when you are at that stage already, then you are… well, in trouble.

The deeper you dive into the world of machine learning the more obvious it gets that, data - the amount, the quality and nature of data - is very often, if not always the bottleneck to successful projects.

And acquiring data, even if it can get easier as you get to know your ways around the world of machine learning, is never going to be easy.

Our advice here? - Start with the data you have, and think of problems that that data alone could solve!

It might seem a bit backward at first, but look: this is not your Opus Magnum. We’re still looking at your first machine learning project. If you have data, any data, then you’re already a step ahead of folks who don’t. And what if you happen to be that folks? Fear not - we are living in an age where you can acquire data from online repositories, that are published for the exact reason you are searching data for. But then again: data first, and problem next. Check what data you can acquire, then think about the problem that data could solve, then move on to the model you are about to build to solve that specific problem.

And then, once again, make sure it’s a simple enough one.