14 Apr 2021
A typical AI startup will use considerable amounts of compute during its journey - and so it’s important to think carefully about what you’re going to need en route and prepare properly before setting off on your high performance computing (HPC) and AI road trip.
Generally, there are three legs in an AI startup’s compute journey to consider:
Many startups opt to use a cloud environment such as Amazon Web Services (AWS), Microsoft Azure or Google Cloud Project for their initial compute requirements and this is generally an excellent way to begin. After all, in the prototyping stage, your compute evolution tends to be stop-start or, as it is often called, ‘flashy’. You use the cloud’s compute resources on demand, only paying when you’re utilising what you need - and that’s a huge benefit in the early stages of prototyping when you’d rather be spending your money on R&D, skilled personnel and expertise instead of expensive physical IT assets like hardware, servers and storage.
However, if you use the cloud provider’s own array of ‘built-in’ applications or start by deploying your applications within that cloud platform, your compute can quickly become ‘landlocked’ within that very cloud (unless you’re using open source technology). The very notion of clouds is that they are accessible and flexible, but it’s actually much harder than you think to switch your compute between one cloud and another.
Cloud providers bombard AI startups with free instances and opportunities to use their platforms so they can effectively get them hooked on the 'Kool-aid' and into their infrastructure and services. However, you have to be aware of the background costs that can spiral. You may not get charged for the data coming in, but you are charged for the data that’s coming out. And, while that’s not a problem on the first leg of your journey when you’re only trying out sample datasets (and, therefore, not using a lot of compute), it becomes an expensive problem later on.
By the time you’re into the second leg of your compute journey (The Production Stage), depending on what you are doing, you could be using massive datasets, so the cost of your compute can suddenly start to increase exponentially. As soon as you go into production and start properly using your virtual servers, the 5-10% of compute you were using every now and then during the Protype Phase suddenly jumps to become a solid 40-60% of your compute all of the time. And, typically at this time in the journey because you're racing against the clock to get your idea produced and out to market as soon as possible, you don't have time to come up for air and start considering your compute roadmap. It ends up being a case of just drinking the kool aid as fast as you can open it...
However, once you have reached more than 50% utilisation of your servers, and it’s a reliable base load, that’s the inflection point at which you should consider either investing in your own hardware/infrastructure in your own on-premise data centre so you’re not paying for the increased volumes of data now going out, or utilising a colocation environment like Kao Data – because, when you do, the cost savings are dramatic.
Sure, you’ll have to pay a marginal connectivity charge just as you would to send data across any network – but it’s nothing like the charges you’ll be paying the cloud for the same transfer of data. And there are many other benefits as well – performance and speed improvements if your high performance computing is clustered in one location rather than on virtualised machines in the cloud, the ability to fine-tune your hardware to fit your bespoke applications, improved security and your own dedicated hardware. You won’t find yourself twiddling your thumbs waiting for your jobs to upload, nodes to become operational and playing the giant game of cloud Tetris whilst waiting for other ‘noisy neighbours’ in the cloud to finish their jobs before yours can run. And when things do go south, you're not relying on the remote-ticketing system within a gigantic cloud customer service framework - you can go old school and speak directly to a human (and HPC expert), or even visit your own hardware - how novel!
Don’t get me wrong, the hyperscale cloud is a brilliant resource for the 99% of compute such as email servers, ERP and project management software, accounting platforms, gaming, etc. And it can be a useful resource when you’re starting out due to its incredible flexibility. But, one size doesn't fit all and intensive HPC and AI is the 1%. In addition, clouds are easy in and hard out, and transitioning out of the cloud isn’t easy - which is why I say you need to think carefully about your compute journey at the offset. It can be tricky trying to detangle yourself although there are companies out there now like ProtoCloud designed to do just that and help you.
When you are at the appropriate compute scale, one of the great benefits of coming to a specialist data centre, like Kao Data, is that, as well as the provision and assurance of resilient, reliable power and technical expertise, we do also have a Megaport connection, which provides a seamless link between the cloud and the data centre – which makes it the perfect computing environment for hybrid cloud/colo computing.
The ideal situation would be to have your steady ‘base load’ of compute, say 60-70% of what you’re doing, in a data centre like Kao Data, and receiving the economies of scale of being part of a large data centre - and the remaining 30-40% of your compute (the peaky, flashy load), in the cloud on a flexible on-demand tariff so you only pay when you really need that extra burst of compute. This would ensure your servers are never maxed out to the full, you can always do what you need to do, you retain a degree of flexibility and your computing base load is a lot cheaper and performing faster.
My advice is always: Don’t be a busy fool and don't get lost in your own computing maze. Consider your computing platform carefully and look at the lifecycle of your compute as a whole journey rather than day-to-day.