28 January 2022
Post
We often hear from our potential clients: Take this dataset and do a PoC in two weeks.
Is this the right PoC approach?
No!
The other names for this approach are:
- uninformative experiment,
- recipe for a disaster,
- burning money,
- expensive and useless mathematical exercise.
Why?
What you should do in data science / machine learning / AI PoC is to
see if it is possible to achieve the business goal with suitable
mathematical methods.
Why could it be enough to take a dataset and build a model? What really needs to be done?
- understanding of the business goal,
- bearing in mind constraints and difficulties,
- very precisely stating the business goal,
- translating the goal into the language of mathematics (not losing
too much in the translation), - data preparation, understanding, and checking of its
correctness, - building the solution,
- integrate the solution into the production system,
- measuring the effectiveness, preferably in monetary units.
It means that a light approach, i.e., skipping too many bullets from
the list, will not solve the right problem. It means that this PoC will
not give us the answer to the right question.
What can be simplified in a PoC?
- choosing a part of the population to which we’d like to apply ML /
AI, - using a limited dataset with a lower cost of preparation, e.g,.
number of features, - reducing time of preliminary data analysis,
- applying basic mathematical methods,
- simple solution integration or no integration,
- reduced effectiveness of the solution, i.e., no real time
processing, long response time, no load balancing, - simplified measuring of effectiveness.