Data is the new Coal (not Oil)

Data is the new Coal (not Oil)

And it shall be used to deliver the power in the new re-electrification wave through the revolution of AI. We see as to how the rapid scale of growth was achieved in the last decade in the ML field. You can say that the year pivotal in this change was 2012 when the first CNN models won some competition organized by the Imagenet project promoters. It is not that the algorithms that year suddenly became 10% better which was by how much the performance was improved for the outcomes. Rather, it was an increase in the computation power and the GPUs that caused the real uptick. Coupled with a lot of data generated on the input side. The Imagenet dataset so to say.

And 10 years on, the same story holds. If at all, it has accelerated in impact. The algorithms and the math/statistics applied in the Neural Networks has created a huge impact and has advanced to a point as a consequence that the law of diminishing returns has been witnessed therein for past some time. Please do not get me wrong as we still have a lot of excitement around models and algorithms. What is with the new generation GNNs or the Graph Neural Networks which continue the good work from the previous set of CNN, RNNs etc. The GNN has revolutionized the cases where the data input can be characterized as a graph as opposed to a structured dataset like a table. The result has been that it has opened new avenues for exploration like in pharmaceuticals and medicine discovery. The discovery of drug Halicin is attributable to this phenomenon.

 But, some of the leading practitioners in the field of AI have pointed out that it yields better sometimes to focus on the other aspect of the conundrum. The data part that is. And the results have been astounding. Wherever folks have focused on the whole problem as also a problem of data as much as mathematical and statistical models and the layers and weights within the Neural Networks, the results have followed. And the thing is that it has just been the beginning. This is a very detail oriented and iterative process unlike the modeling part which can move relatively quickly as it can be done in some level of isolation and at your own pace. But also very creative where you ought to put a lot of Permutations and Combinations in the model front. When data gets involved, the pace is dictated by some external factors and some other logistical factors, let alone the implementation part.

Hence it became very clear to me that as we move ahead in this journey, we shall uncover many new frontiers and at a pace matched only by our effort on the assimilation of the things on the dataside which we can also say the left stream of the ecosystem. That is the setup that comes before the modeling. And also being very creative in applying the dataset to the various models available to empirically solve problems and disrupt industries altogether. Google used to be about Page Rank and Inverted Index document search. Then BERT came along and now MUM. Google disrupted the search landscape through its science and algorithm. We believe that a multitude of industries are at the cusp of this disruption with a combination of DataCentric and mix-matching NN modeling approach. Also, the requirement for the dataset is not to be a massive one at all. Good labeled and consistent data in a small measure can work wonders for your model artifacts. The limiting factor is often the whole Ai and ML setup of engineers and technology that needs to be setup. And those issues are being solved as well as we move along. With the advent of the cloud paradigm, many companies can solve the infrastructure problem for you. There is a revolution in the whole space remember.

So you can see the data portion to be like the fuel which goes into the generation of the outcomes which are the intelligent systems of ML which are enlivened by the intertwining grid. The more you pour in the fuel, the more the intelligence akin to electricity is generated. It is just a matter of adding your own new appliance from there on. The appliance here is your own unique business case. Any new system or project delivered on the business side is like that electrical appliance and there shall be no dearth of those once we have the fuel to generate the said electricity on that mini project in the larger universe. We do not need to wait upon the AI revolution of the self driving car as a ultimate crescendo. We can usher in the revolution by getting the Demand Forecast to get better for our small business as well. Or by automating and making more efficient some part of our customer invoicing cycle. Or even a small manufacturing supply chain problem solution. Hence, the dataset and the integration of the same in the grid shall be pivotal to this pace of re-electrification. For the generators and the grid have been laid out. All we need to do is define the new appliance aka the project and plug in the requisite fuel in the system. You guessed it, the dataset for the project! And as my friend said that shall give rise to the new General re-Electric company. Any Edisons out there for it..?