'.NET for Apache Spark' Debuts for C#/F# Big Data -- Visual Studio Magazine

'.NET for Apache Spark' Debuts for C#/F# Big Data

By David Ramel
04/25/2019

Almost four years after the debut of Apache Spark, .NET developers are on track to more easily use the popular Big Data processing framework in C# and F# projects.

The preview project, called .NET for Apache Spark, was unveiled yesterday (April 24). Its development will be conducted in the open under the direction of the .NET Foundation.

Spark is described as a unified analytics engine for large-scale data processing, compatible with Apache Hadoop data whether batched or streamed.

Currently, Spark is accessible via an interop layer with APIs for the Java, Python, Scala and R programming languages. While .NET coders have been able to use Spark with Mobius C# and F# language binding and extensions, the new project seeks to improve on that scheme while paving the way to add more language support. Microsoft promised to work closely with the open source Spark community to help the project succeed beyond similar efforts such as Mobius, which it said were hindered by a lack of communication.

".NET for Apache Spark provides high performance APIs for using Spark from C# and F#," said Microsoft in an announcement post. "With [these] .NET APIs, you can access all aspects of Apache Spark including Spark SQL, DataFrames, Streaming, MLLib etc.," it said. ".NET for Apache Spark lets you reuse all the knowledge, skills, code, and libraries you already have as a .NET developer."

The project's origin is explained in a Spark Project Improvement Proposal (SPIP) titled .NET bindings for Apache Spark created on Feb. 27. It says: "Apache Spark provides programming language support for Scala/Java (native), and extensions for Python and R. While a variety of other language extensions are possible to include in Apache Spark, .NET would bring one of the largest developer community to the table. Presently, no good Big Data solution exists for .NET developers in open source. This SPIP aims at discussing how we can bring Apache Spark goodness to the .NET development platform."

Microsoft yesterday said: "The C#/F# language binding to Spark will be written on a new Spark interop layer which offers easier extensibility. This new layer of Spark interop was written keeping in mind best practices for language extension and optimizes for interop and performance. Long term this extensibility can be used for adding support for other languages in Spark."

Project backers will work on that extensibility, which was outlined in yet another SPIP titled Interop Support for Spark Language Extensions created last December that says:

There is a desire for third party language extensions for Apache Spark. Some notable examples include:

C#/F# from project Mobius https://github.com/Microsoft/Mobius

Haskell from project sparkle https://github.com/tweag/sparkle

Julia from project Spark.jl https://github.com/dfdx/Spark.jl

Presently, Apache Spark supports Python and R via a tightly integrated interop layer. It would seem that much of that existing interop layer could be refactored into a clean surface for general (third party) language bindings...."

Microsoft addressed the aforementioned lack of communication with the open source Spark community in its SPIP, stating:

We recognize that earlier attempts at this goal (specifically Mobius https://github.com/Microsoft/Mobius) were unsuccessful primarily due to the lack of communication with the Spark community. Therefore, another goal of this proposal is to not only develop .NET bindings for Spark in open source, but also continuously seek feedback from the Spark community via posted Jira’s (like this one) and the Spark developer mailing list. Our hope is that through these engagements, we can build a community of developers that are eager to contribute to this effort or want to leverage the resulting .NET bindings for Spark in their respective Big Data applications.

Yesterday's announcement of the first preview also provided a peek into further development, which will include improving benchmarking performance, such as Arrow optimizations. Specifically, the project's roadmap calls for upcoming features such as:

Simplified getting started experience, documentation and samples
Native integration with developer tools such as Visual Studio, Visual Studio Code, Jupyter notebooks
.NET support for user-defined aggregate functions
.NET idiomatic APIs for C# and F# (e.g., using LINQ for writing queries)
Out of the box support with Azure Databricks, Kubernetes etc.
Make .NET for Apache Spark part of Spark Core

Source code for the preview project and detailed instructions on using it and interacting with it can be found on GitHub, where it has already garnered 446 stars at the time of this writing (climbing by the minute), with Microsoft's Terry Kim and Rahul Potharaju listed as primary contributors.

About the Author

David Ramel is an editor and writer for Converge360.

Printable Format

comments powered by Disqus

Featured

Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.
.NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.
Data Anomaly Detection Using a Neural Autoencoder with C#

Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.
What's New for Python, Java in Visual Studio Code

Microsoft announced March 2024 updates to its Python and Java extensions for Visual Studio Code, the open source-based, cross-platform code editor that has repeatedly been named the No. 1 tool in major development surveys.
Microsoft Build 2024 Sessions Listed: Copilots, Copilots & More Copilots

Microsoft today announced the session catalog for its Build developer conference next month in Seattle, with AI unsurprisingly dominating the event.

Subscribe on YouTube

.NET Insight

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

Visual Studio Live! Chicago
April 29-May 3, 2024

VSLive! 2-Day Training Seminar: Building Cloud-Ready, Resilient Systems in .NET
June 4-5, 2024

VSLive! 4-Day Hands-On Training Seminar: Full Stack Hands-On Development with .NET (Core)
July 16-19, 2024

Visual Studio Live! Microsoft HQ
August 5-9, 2024

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI - A Hands-on Experience
August 20-21, 2024

VSLive! 4-Day Hands-On Training Seminar: Hands-on with Blazor
September 17-20, 2024

VSLive! 2-Day Hands-On Training Seminar: Developing Secure ASP.NET Web Apps
September 24-25, 2024

Live! 360 Orlando
November 17-22, 2024

VSLive! 4-Day Hands-On Training Seminar: Full Stack Hands-On Development with .NET (Core)
December 10-13, 2024

Free Webcasts

> More Webcasts