Databricks launches LakeFlow to help its customers build their data pipelines

Comment
Since its launch in 2013, Databricks has relied on its ecosystem of partners, such as Fivetran, Rudderstack, and dbt, to provide tools for data preparation and loading. But now, at its annual Data + AI Summit, the company announced LakeFlow, its own data engineering solution that can handle data ingestion, transformation and orchestration and eliminates the need for a third-party solution.
With LakeFlow, Databricks users will soon be able to build their data pipelines and ingest data from databases like MySQL, Postgres, SQL Server and Oracle, as well as enterprise applications like Salesforce, Dynamics, Sharepoint, Workday, NetSuite and Google Analytics.
Why the change of heart after relying on its partners for so long? Databricks co-founder and CEO Ali Ghodsi explained that when he asked his advisory board at the Databricks CIO Forum two years ago about future investments, he expected requests for more machine learning features. Instead, the audience wanted better data ingestion from various SaaS applications and databases. “Everybody in the audience said: we just want to be able to get data in from all these SaaS applications and databases into Databricks,” he said. “I literally told them: we have great partners for that. Why should we do this redundant work? You can already get that in the industry.”
As it turns out, even though building connectors and data pipelines may now feel like a commoditized business, the vast majority of Databricks customers were not actually using its ecosystem partners but building their own bespoke solutions to cover edge cases and their security requirements.
At that point, the company started exploring what it could do in this space, which eventually led to the acquisition of the real-time data replication service Arcion last November.
Ghodsi stressed that Databricks plans to “continue to double down” on its partner ecosystem, but clearly there is a segment of the market that wants a service like this built into the platform. “This is one of those problems they just don’t want to have to deal with. They don’t want to buy another thing. They don’t want to configure another thing. They just want that data to be in Databricks,” he said.
In a way, getting data into a data warehouse or data lake should indeed be table stakes because the real value creation happens down the line. The promise of LakeFlow is that Databricks can now offer an end-to-end solution that allows enterprises to take their data from a wide variety of systems, transform and ingest it in near real-time, and then build production-ready applications on top of it.
At its core, the LakeFlow system consists of three parts. The first is LakeFlow Connect, which provides the connectors between the different data sources and the Databricks service. It’s fully integrated with Databricks’ Unity Data Catalog data governance solution and relies in part of technology from Arcion. Databricks also did a lot of work to enable this system to scale out quickly and to very large workloads if needed. Right now, this system supports SQL Server, Salesforce, Workday, ServiceNow and Google Analytics, with MySQL and Postgres following very soon.
The second part is LakeFlow Pipelines, which is essentially a version of Databricks’ existing Delta Live Tables framework for implementing data transformation and ETL in either SQL or Python. Ghodsi stressed that LakeFlow Pipelines offers a low-latency mode for enabling data delivery and can also offer incremental data processing so that for most use cases, only changes to the original data have to get synced with Databricks.
The third part is LakeFlow Jobs, which is the engine that provides automated orchestration and ensures data health and delivery. “So far, we’ve talked about getting the data in, that’s Connectors. And then we said: let’s transform the data. That’s Pipelines. But what if I want to do other things? What if I want to update a dashboard? What if I want to train a machine learning model on this data? What are other actions in Databricks that I need to take? For that, Jobs is the orchestrator,” Ghodsi explained.
Ghodsi also noted that a lot of Databricks customers are now looking to lower their costs and consolidate the number of services they pay for — a refrain I’ve been hearing from enterprises and their vendors almost daily for the last year or so. Offering an integrated service for data ingestion and transformation aligns with this trend.
Databricks is rolling out the LakeFlow service in phases. First up is LakeFlow Connect, which will become available as a preview soon. The company has a sign-up page for the waitlist here.
Every weekday and Sunday, you can get the best of TechCrunch’s coverage.
Startups are the core of TechCrunch, so get our best coverage delivered weekly.
The latest Fintech news and analysis, delivered every Tuesday.
TechCrunch Mobility is your destination for transportation news and insight.
By submitting your email, you agree to our Terms and Privacy Notice.
When Jordan Nathan launched his DTC nontoxic cookware company, Caraway, in 2019, he knew he was not the only founder trying to sell a new brand of pots and pans…
Out of an abundance of caution, the car took two minutes to turn a corner.
There has been a silly amount of drama in the run-up to Tesla‘s annual shareholder meeting on Thursday. The company is set to hold a vote on “re-ratifying” the $56…
To give users more control over the contacts an app can and cannot access, the permissions screen has two stages.
The push to produce a robotic intelligence that can fully leverage the wide breadth of movements opened up by bipedal humanoid design has been a key topic for researchers.
A TechCrunch review of LinkedIn data found that Ford has built this team up to around 300 employees over the last year.
The most critical systems of our modern world rely on GPS, from aviation and road networks to emergency and disaster response, from precision farming and power grids to weather forecasting…
Since fintech startup Brex’s inception in 2017, its two co-founders Henrique Dubugras and Pedro Franceschi have run the company as co-CEOs. But starting today, the pair told TechCrunch in an…
Hiya, folks, and welcome to TechCrunch’s regular AI newsletter. This week in AI, Apple stole the spotlight. At the company’s Worldwide Developers Conference (WWDC) in Cupertino, Apple unveiled Apple Intelligence,…
India’s largest wealth manager focused on ultra-high-net-worth individuals, 360 One WAM, has agreed to acquire popular Indian mutual fund investment app ET Money for about $44 million. Earlier called IIFL…
Helen Toner, a former OpenAI board member and the director of strategy at Georgetown’s Center for Security and Emerging Technology, is worried Congress might react in a “knee-jerk” way where…
Layoffs are tough. This year alone, we’ve already seen 60,000 job cuts across 254 companies according to layoffs.fyi. Looking for ways to grow your network can be even harder during…
YouTube announced this week the rollout of “Thumbnail Test & Compare,” a new tool for creators to see which thumbnail performs the best. The feature first launched to select creators…
Waymo has voluntarily issued a software recall to all 672 of its Jaguar I-Pace robotaxis after one of them collided with a telephone pole. This is Waymo’s second recall. The…
The hotel guest management technology company’s platform digitizes the hotel guest journey from post-booking through checkout.
The TechCrunch team runs down all of the biggest news from the Apple WWDC 2024 keynote in an easy-to-skim digest.
InScope leverages machine learning and large language models to provide financial reporting and auditing processes for mid-market and enterprises.
Venture fundraising has been a slog over the last few years, even for firms with a strong track record. That’s Foresite Capital’s experience. Despite having 47 IPOs, 28 M&As and…
A year ago, Databricks acquired MosaicML for $1.3 billion. Now rebranded as Mosaic AI, the platform has become integral to Databricks’ AI solutions. Today, at the company’s Data + AI…
RetailReady targets the $40 billion compliance market to help reduce the number of retail compliance losses that shippers incur annually due to incorrectly shipped packages.
Since its launch in 2013, Databricks has relied on its ecosystem of partners, such as Fivetran, Rudderstack, and dbt, to provide tools for data preparation and loading. But now, at…
A big shoutout to the early-stage founders who missed the application window for the Startup Battlefield 200 (SB 200) at TechCrunch Disrupt. We have exciting news just for you! You…
When one of the co-creators of the popular open source stream-processing framework Apache Flink launches a new startup, it’s worth paying attention. Stephan Ewen was among the founding team of…
With most residential solar panels installed by smaller companies, customer experience can be a mixed bag. To try to address the quality and consistency problem, Civic Renewables is buying small…
Small VC firms require deep trust, mutual support and long-term commitment among the partners — a kinship that, in many ways, resembles a family dynamic. Colin Anderson (Palantir’s ex-CFO and…
Fisker is issuing the first recall for its all-electric Ocean SUV because of problems with the warning lights, according to new information published by the National Highway Traffic Safety Administration…
Gorilla, a Belgian company that serves the energy sector with real-time data and analytics for pricing and forecasting, has raised €23 million ($25 million) in a Series B round led…
South Korea’s fabless AI chip industry saw a slew of fundraising events over the last couple of years as demand for hardware to power AI applications skyrocketed, and it seems…
Here’s a list of third-party apps that were Sherlocked by Apple at this year’s WWDC.
Black Semiconductor, which is developing a chip-connecting technology based on graphene, has raised $273M in a combination of private and public funding. 
Powered by WordPress VIP

source
Sponsor:News technical sponsor
Sponsor:News AI sponsor
Sponsor: AI sponsor
Sponsor: AI sponsor

Leave a Comment

Vélemény, hozzászólás?

Az e-mail címet nem tesszük közzé. A kötelező mezőket * karakterrel jelöltük