In my opinion, Python is the perfect language for prototyping in Big Data/Machine Learning fields. However like many developers, I love Python because it’s flexible, robust, easy to learn, and benefits from all my favorites libraries. Python for Spark is obviously slower than Scala. While using Spark, most data engineers recommends to develop either in Scala (which is the “native” Spark language) or in Python through complete PySpark API. ![]() I wrote this article for Linux users but I am sure Mac OS users can benefit from it too. That’s why Jupyter is a great tool to test and prototype programs. It allows you to modify and re-execute parts of your code in a very flexible way. Jupyter Notebook is a popular application that enables you to edit, run and share Python code into a web view. ![]() ![]() In a few words, Spark is a fast and powerful framework that provides an API to perform massive distributed processing over resilient sets of data. Spark with JupyterĪpache Spark is a must for Big data’s lovers.
0 Comments
Leave a Reply. |