Description
- Managing Real-Time Context Data
- Data Transformation and Persistence using Apache NIFI
- Setting up a Google Cloud Environment
- Creating a DataProc Cluster and connecting it to Jupyter Notebook
- Using Google Cloud Storage service
- Submitting a PySpark Job on DataProc
- Modelling a Machine Learning Solution on PySpark for Multi-classification
Data processing is key to ensure Machine Learning models' performance. But commonly, data is collected and stored in its raw format, and to get insights from it, post-processing is required. What if all of this could be automated and managed through pipelines?
This webinar not only demonstrates how to collect data in real-time, transform it, and persist it using Draco to be ready for further use, but it also shows how to build an end-to-end AI service with PySpark hosted in the cloud.