- Databricks
- Spark-notebook.io
- Hue Spark Notebooks
- Jupyter
- Zeppelin
Databricks
Databricks Notebooks appear to be the nicest for working with interactive Spark but alas they are not open source. Every time I seem them they are noticeably improved. So far they are definitely setting the bar.
Spark Summit 2015 demo: Creating an end-to-end machine learning data pipeline with Databricks from Databricks on Vimeo.spark-notebook.io
This is a nice looking project by Andy Petrella. It is open source with an option for commercial support. This seems like a nice option for interactive Spark using Scala. The project seems very active with many releases and is packaged a variety of ways to work with different versions.Unfortunately, I cared more about Python support, so I removed this from my list for now.
Hue Spark Notebooks
When I read that Hue was adding a Spark Notebook feature, I was very excited. I already had Hue on my system, maybe I already had a Spark Notebook I could use and didn't even know it.Livy is the REST server backing these notebooks. It is a promising project that looks like it will enable interactive Spark even in yarn-cluster mode.
I was very enamored with Livy until I saw Apache Toree and became equally enamored with it.
Unfortunately, as of CDH 5.5.1 this feature is still very much in beta.
Jupyter
Jupyter is a long-lived project for Python notebooks. More recently the project has been expanded to include more languages and now boasts an impressive list.Jupyter just needs a Kernel to provide interactive Spark and there are three options:
- Use pySpark with IPythonKernel
- Sparkmagic kernel
- Apache Toree (IBM Kernel)
I've explored these options more here.
Zeppelin
Zeppelin is a newer Apache incubator project building notebook functionality on the JVM. It appears to be partly inspired by Databricks notebooks.This is a nice looking project that is definitely going to keep getting better. Things that caught my eye right away were the built-in pivot tables as well as how it supports interpreter groups to enable sharing things like a SparkContext between multiple languages (pySpark, Spark SQL, etc).

No comments:
Post a Comment