How to Freeze Your On-Premises Hadoop Version & Migrate When YOU Are Ready
While some have moved their Hadoop clusters to the cloud or migrated these workloads to cloud data warehouses like Snowflake or Databricks, many companies are still running Hadoop on-premises.
Most of them — and since you are reading this, this probably includes you — pay for commercially-supported Hadoop, either Hortonworks Data Platform (HDP) or Cloudera Data Hub (CDH).
For companies like yours, the clock is not just ticking, the alarm is blaring. That’s because Cloudera, which bought Hortonworks in 2018, is halting all HDP and CDH support this year. Originally set to end in March, Cloudera quietly extended support for CDH 6.2 and 6.3 until Sept. 30th, while HDP support ends earlier, on June 30th.
This is a mere stay of execution. The fact is that in just 2-4 months, current Cloudera customers must either start and complete a full migration of their analytics stack off Hadoop, or hire a data engineering team to self manage this notoriously-complicated, time-consuming and cranky platform.
While LinkedIn can make managing a 10,000 node, 500 PB Hadoop cluster in-house look like a breeze, they have massive engineering teams that have invested 15+ years and endless dollars into Hadoop.
For everyone else, self supporting Hadoop will create more downtime, data quality problems, and spiraling costs than they ever thought possible.
However, the right platform — a data observability platform — has helped many companies successfully turn off Cloudera support while keeping their Hadoop on-premises stack.
Not just that, these companies are now actually running Hadoop faster, with less management effort, and at lower cost.
If you are a data engineer or similar technical expert, dive directly into our product documentation to grok all of our powerful Hadoop Optimization features.
Or schedule a free 30-minute assessment of your HDP or CDH infrastructure with a Hadoop expert from Acceldata today.
And read our Part II blog about how Acceldata not only helps you optimize the cost and performance of Hadoop, but when you are ready, can help you smooth a future migration to a cloud data warehouse such as Snowflake and manage it in-house with similar efficiency and reduced costs.
Three Sub-Optimal Options
So it’s no surprise that the remaining Cloudera customers are panicking. Most are feeling forced towards a rushed migration of Hadoop to one of the three options below:
- The Cloudera Data Platform (CDP). This is the proprietary route Cloudera has been pushing hard for the past three years ever since it announced the imminent demise of CDP and HDP support. Coming in both public and private cloud flavors, CDP was built completely from the ground up to overcome Hadoop’s limitations. Apart from a few features and the Cloudera name, CDP shares little with its two predecessors, CDH and HDP.
What that means is that moving to CDP is not a simple “upgrade” as Cloudera calls it, but looks a lot more like a complex, multi-step migration, with all of the associated labor, expense and risk.
So before you considering migrating to CDP, you need to ask these two questions:
- Do you have the engineers to rewrite your applications and recode your data pipelines?
- Do you really want to voluntarily lock your analytics stack into a single vendor?
For most existing HDP and CDH customers, the answer appears to be “No!”, judging by Cloudera’s quietness around customer migration figures, the heavy executive churn at its Santa Clara headquarters, as well as scattered negative insider reports.
- Hadoop-as-a-Service. Transferring on-premises Hadoop clusters to a third-party public cloud or hosting provider has been a popular route over the past several years. There is no shortage of Hadoop-as-a-service providers. The best-known ones include Amazon EMR, Azure HDInsight, and Google DataProc. They all claim to run your Hadoop applications faster, at lower cost and with lower management ops than on-premises Hadoop. And they also argue that lifting-and-shifting your data and applications to their clouds is cheaper and less traumatic than rebuilding your stack using Cloudera’s non-Hadoop option, CDP.
There is some truth to those claims. However, risk is relative. Even with a lift-and-shift move, there is major inherent risk when huge amounts of data, fragile, older applications, and mission-critical processes are involved.
To avoid lost or erroneous data, malfunctioning data pipelines, ballooning costs, and other catastrophes, migrations should be carefully planned and tested, not rushed. There’s simply no shortcut. Two to four months is not enough time for companies with terabytes or petabytes of data and mission-critical analytics to plan and execute a move to the cloud, no matter how much money and engineers you throw at it. And with Cloudera ending support imminently, there’s no time to make this switch.
Moreover, lifting-and-shifting your on-premises Hadoop infrastructure to the cloud means that your organization will miss out on the cost and performance benefits of refactoring your data infrastructure for a modern, cloud-native data platform. Meaning Hadoop-as-a-service may just be an expensive interim step. Which is why some Hadoop users have opted to jump straight to a…
- Managed database in the cloud. Low-ops, serverless databases such as Snowflake, Databricks, Google BigQuery, Amazon RedShift, and others are all the rage. It’s no surprise. These platforms promise real-time data ingestion, fast and rich analytics, easy application development and even easier scalability and management. For any Hadoop user tired of trying to tame server problems and pull out the answers they really want, this may seem like a godsend.
While the potential benefits are certainly bigger than simply uploading your Hadoop assets into the cloud, so again are the risks. This is a full-blooded migration. The risks to your data and analytical workloads are high — as high or higher than “upgrading” to Cloudera CDP — especially if the migration is rushed and done all at once rather than in well-planned and tested phases.
It can also be extremely costly if you don’t take the time to identify critical data jobs, their dependencies and data repositories, document their performance characteristics, and delete obsolete jobs and code, so that only known, working assets are migrated.
Also, every modern managed database has different strengths, weaknesses and pricing schemes. Choose hastily, and your business might end up with a database that fails to serve your business and technical needs, or can do so only by running up processing and storage costs.
While there is less technical lock-in with cloud services than with proprietary enterprise software, there are still formidable switching costs involved anytime you need to move mission-critical data repositories and pipelines while trying to prevent data corruption and broken applications.
Coming Full Circle
All three of the forced migrations above are risky and rushed for all the reasons laid out.
There is a fourth option: stay on your current version of Hadoop indefinitely by bringing management of your on-premises Hadoop clusters in-house.
But didn’t we dismiss this route as being beyond the skill, budget and risk tolerance of most companies? Yes, but that was for companies trying to manage Hadoop naked (and afraid).
Armed with a data observability platform like Acceldata and backed up by our Hadoop experts, companies can manage massive Hadoop infrastructures on their own with efficiency and confidence. You will avoid:
- Expensive, time-consuming re-architecting of your applications and data pipelines to migrate to CDP;
- Rushed lifting-and-shifting of data to hosted platforms that prove to be an expensive halfway step;
- Forced, risky migrations to popular but dramatically-different cloud-native platforms
Instead, your company will gain better Hadoop support through Acceldata than you had with Cloudera, including best-in-the-industry Hadoop SLAs!
In the last five years, Acceldata has provided hundreds of man-years of support to Fortune 500 companies and their data. Our platform provides data observability over hundreds of Petabytes of data. Premier Silicon Valley firms including Insight Partners, Lightspeed, Sorenson and Emergent Ventures have invested $46 million.
Aided by our experts and the automation of Acceldata Pulse, you’ll move from the break/fix support you had through Cloudera to true optimization, self healing and proactive fixes. This will enable your Hadoop infrastructure to run at peak SLOs and SLAs. This translates into a 30-40 percent capacity increase for most Hadoop customers, while slashing the costs of licenses, support, hardware, applications, and labor by 50 percent.
Acceldata’s cloud platform, strong engineers, and deep integrations help you de-risk open source and maintain your long-term data platform independence. You can keep the lights on your Hadoop clusters as long as you want, and gain better performance and lower costs.
Happy Hadoop Customers
Don’t take my word, though. We have many Fortune 500 and other enterprise customers that have improved their on-premises Hadoop environments using Acceldata, including:
PubMatic, one of the largest AdTech companies in the United States. Using Acceldata, PubMatic manages 4,000+ nodes scattered over 60+ on-premises clusters, all running HDP 3.1.0. In particular, Acceldata Pulse has helped PubMatic optimize performance and cost on a massive scale, saving millions in unnecessary software licenses by reducing the storage footprint. Acceldata also helped PubMatic slash its MTTR (Mean Time To Resolution) for formerly-frequent outages and bottlenecks.
Acceldata “helped us optimize HDFS performance, consolidate Kafka clusters, and reduce cost per ad impression, which is one of our most critical performance metrics,” says Ashwin Prakash, engineering leader at Pubmatic. “Acceldata's data observability saved us millions of dollars for software licenses that we no longer need. Now we can focus on scaling to meet the needs of rapidly growing business.”
(Read about the latest version of Acceldata Pulse 2.1 and its new features, including Hadoop ones.)
True Corporation, a leading Southeast Asian telecoms provider. With Acceldata, True Digital solved pervasive system performance and scalability issues in its 35 PB HDP-based data lake. By optimizing storage costs by 25 percent, True Digital saved $2 million a year in unneeded software licenses, freed up system capacity to save another $1 million, and eliminated all unplanned outages and Sev 1 issues.
“Acceldata’s tools fixed our analytics pipeline issues, improved visibility into our data systems and recommended ways to scale and optimize our systems to meet future requirements,” according to Wanlapa Linlawan, True Digital’s Analytics Head.
PhonePE, which provides e-payment services to 350 million Indian consumers, smoothly scaled its on-premises Hadoop (HDP 2.6.5 and 3.1.4) clusters by 2,000 percent — from 70 to 1,500+ nodes — with our help. The Walmart subsidiary also delivered 99.97% availability while cutting $5 million in annual software license costs and freeing up its data engineers from daily troubleshooting.
Oracle, the enterprise software giant, now beats all of its performance and reliability SLAs for its 170+ HDP 2.6.5 nodes with the help of Acceldata. Hadoop/Hive queries now run twice as fast, while its engineering team is three times more productive.
These companies didn’t just choose Acceldata over Cloudera and other Hadoop commercial support providers. They are relying on us to provide deep visibility and control over their data stack that they know they cannot get via legacy Application Performance Monitoring (APM) tools.
Recent Cloudera to Acceldata Wins
The countdown to Cloudera’s end of support for CDH and HDP has helped us win many new customers. Not only are they able to keep the lights on with technology that they have invested many years and millions of dollars, they are thrilled at how Acceldata is delivering immediate performance improvements and ROI.
Many of our customers chose Acceldata for these benefits:
1) the proven Hadoop expertise of our support team;
2) how Pulse and our experts will help improve its Hadoop SLAs for responding to problems but also fixing them;
3) the data reliability features of Acceldata Torch, which they plan to deploy as a smarter way to detect and resolve anomalies than its current monitoring solution, Pepperdata.
Still another customer win is a leading credit card processing company. It will use Acceldata to manage a whopping 6,000 Hadoop nodes on-premises.
Not Standing Still
The Acceldata platform and our Hadoop experts can keep you efficiently running CDH or HDP for years to come.
Visit our Hadoop home page to learn more.
Dive directly into our product documentation to read about our powerful Hadoop features.
Or schedule a free 30-minute assessment of your HDP or CDH infrastructure with a Hadoop expert from Acceldata today.
At the same time, our technology and team also work behind the scenes to prepare your infrastructure for an eventual migration to a modern cloud-native database of your choice.
Jerome Lintz, Acceldata Global Account Executive, has had a successful 20+ year track record working with enterprises in the APM, Software Reliability, Data Warehousing & Big Data Space. Jerome works with enterprise clients to successfully unlock the value of data.