Is it Google AI Optformer vs Amazon Sagemaker..?

Utkarsh Pande

Sep 6, 2022 • 6 min read

Photo by John Carlo Tubelleza / Unsplash

No not exactly I reckon. Because this is not just some deterministic cloud war (AWS vs GCP) we are talking about. Sure, it connects to the cloud revenue in future as a business model, but it is not in the minds of the Product Leaders when they are introducing these services. It is rather for a very very long drawn road and a shared journey ahead for a Niche focused (Task based) AI revolution we call a re-electrification. But even when the focus is task based individually, some aspects shall need shared learning. Because it is not a war but an act of cooperation in future. Among companies and within the teams inside of companies.

But the proverbial AI transformer grids to be laid in future - that is the ML Ops plumbing (data preprocessing etc) and deployment systems - shall be bespoke and very very Niche and customized. As you shall read throughout this blog, our emphasis is on the Native & Niche nature of the future of AI on the application side of things. It is also evident in the now talked about Data-Centric AI discourse from some of the leading practitioners in the field. The game in town is not just building models on various layers of the Neural Networks on some superset data of millions of images/text data. But it is now about transfer learning and retraining the larger models on smaller labeled datasets for specific tasks. That is how the mass adoption is going to happen. Faraday might have generated the spark on the generator, electricity became a force to reckon with and reshape the world only when it reached the last mile through the transformer grid and into the appliances. And these appliances are like the transfer learning problems.

For the uninitiated, these services are provided by Google and Amazon for users in the ML domain who train and deploy their models. The latest called Optformer is unveiled by Google to help ML engineers to autotune Hyperparameters through transfer learning when they train their huge datasets on a transformer based model. You can also use the Amazon SageMaker high-level Amazon SageMaker Python SDK to configure, run, monitor, and analyze hyperparameter tuning jobs. More specifically, here https://sagemaker.readthedocs.io/en/stable/overview.html#sagemaker-automatic-model-tuning. Hyperparameters are constantly changed and the model results are vetted for a change in accuracy for a particular dataset. This, along with concepts like Activation Function and Weights/Biases, Backpropagation, Grad Check (during debugging and not training generally) for gradient descent for the error function/cost function constitutes the bulk of action in training different models. Later, a checkpoint and tokenizer combination with the model would be invoked in production environment to give you the result that you need. Hopefully, because, as it is, the LIVE data is always changing in the real world. The classic question that presents here is that how much of this knowledge is sharable and applicable on different tasks among teams. While the deployment pipelines may be specific to companies and teams based on the frameworks and services they use, the concepts mentioned above are central to all model-learning and training and the knowledge there can be shared and reused to make the ecosystem move faster together as far as business application and results are concerned. The multitude of use-cases that emerge from different downstream tasks require as many experiment runs. It is a time and money intensive job if done in isolation. This calls for an efficient usage of resources and money as an ecosystem. And sharing and collaboration on above concepts and tasks is one way to go.

You have to understand that the larger base models are trained on particular datasets like an image dataset which shall contain tens of millions of data points and images. A model which essentially is a multilayered Neural Network has an input layer where this training based pre-processed dataset is ingested with some embedding and then it shells out an output from the output layer which can be directly applied for a use case or be used for a downstream task like Image Classification etc for a basic Image captioning model with a particular checkpoint. But then these larger base-models need to be adopted and adapted for specific tasks on specific datasets to solve downstream tasks which form the busines use-case. And it is the shelling out of these downstream task based AI applications which are mass adopted. Say on an IOT downstream task on RPA.

Now, after the singular event of the publishing of the "Attention is all you need" paper by Google's team as recently as Nov 2017 or thereabouts, the transformers architecture has taken the world by storm. I was actually surprised to learn that the paper was published just a few years back. It has resulted in a massive adoption of the encoder & decoder based Transformer systems in some areas like NLP. I mean the while of field now practically has been high jacked by Transformer based neural net. These work on the principal of assigning something called an attention weight to the input vector in the encoding layer in addition to the positional encoding and that is embedded in the vector which is for a word. This is the relation that the word has to all the other words in the sequence before it. So, I think it creates a matrix which can help reference the sequence back infinitely. Thus was the real turning point as the earlier Neural Networks could not reference much into the past words vectors. Bye Bye RNNs etc which represented some primitive architectures so to say. Even the famous BERT system used by Google for its Search Engine is based on the same architecture. Yeah you guessed it right, the T stands for Transformer there. I mean I had been reading names like AlBERT, ALBERTA etc al over the years and knew there was something there. Now I know it. So, now Google has gone fully niche on Transformers with their unveiling of the OPTformer. And it makes sense as well as Google is after all the world's Search Engine and has to be at the frontier of NLP. Hence, they have zeroed-in/zoomed-in on it to be the best AI house in the space. Maybe Pinterest or Snapchat would go for a Computer Vision or Image based AI leadership. Hence, the ML pipeline at Optformer would be focused on this niche. And they cant be the best hyperparameter tuning platform for all use cases as the future evolves. More niche based services shall emerge catering to all the aspects of the wider ML Pipeline. But at the same time, many other teams outside Google shall be very happy to have access to the visualizations and experiment results generated within these teams using Optformer. In turn, Google might get help from some Computer Vision related breakthrough at Pinterest. So while, they develop their own ML Pipelines, they might need sharing the model tracking and experiment results. That shall make the work in Vision and Language based NN faster I reckon.

Thus, the transformer architecture which is so useful for a company like Google still shall be used in other companies. Ofcourse, the "Attention is all you need" paper also was given by the Google team working on these problems. Amazon has the Sagemaker which is not so focused on any one domain actually. I am so confident, in years to come all major tech players in FAANG/MAGA may have their own versions as ML Ops pipelines which shall be unveiled to the world to co-opt. But their model related progress shall need sharing and collaboration for the wider business case for AI. And that brings me to the point I want to discuss.

As we have been saying that the future of AI shall be a long drawn road where the asphalt is yet to be laid, so we have started to see the companies say as much. They understand it shall be focused and niche based research and implementation that shall yield within their setup. Specially in the paradigm of the DataCentric AI and task based outcomes. I mean models are already applied in tandem with the datasets. One model trained on a certain bias dataset shall not perform as well with accuracy and other metrices probably with some other data which has a different bias. And the LIVE data does not come with that caveat. We have things like Concept Drift and Data Drift which point to these situations. But the same work should not be repeated on the experiment runs in the larger scheme of things if we are to make faster progress in AI application.

Hence, the plumbing lines and the grid-points in the proverbial Re-electrification grid shall have to bespoke and work on localized problems and systems. But the corporations or startups shall have to work more as cooperating entities and co-opt systems and processes than just work in siloes at the same time. That is the only way to harness the progress made by other teams for your systems. One cannot hope to work in closed systems. So this makes for an interesting situation. While, some specific pipelines and systems shall emerge to cater to company specific problems and deployment issues, some other work shall progress only in a shared manner like collaboration in models and the past training data and runs. No-one should reinvent the wheel as far as model accuracy is concerned. And only when this data is shared readily and easily among teams shall the process be fast. The fast applications in businesses shall happen only when these systems and mechanisms work in tandem. Hence, some systems and their outputs like Tracking the model runs and experimentation data shall be needed to be shared and accessed easily and intelligibly even when the ML Ops deployment and preprocessing systems work separately. That shall bring out the spirit of the AI revolution as the teams collaborate more even when doing some tasks in isolation on their niche systems. And this makes the future of AI very rich and diverse as much as it does exciting.