Problem Statement
Requirement of a framework containing capabilities to manage and store data, intelligence, and analytics engine to monitor suspicious entities/transactions, and an investigation interface to evaluate alerts, along with a visualization and reporting solution that provides an overview of overall operations.
Value Proposition
Structured data Should be provide in either Streaming Format or HDFS Format. Data is received from The Ingestion layer which is further To create ETL, Analytical Data/Reporting Marts, which will be further used in the Investigation.
- Horizontal table partitioning approach
- Calculating Columns for Analytical approach/Processes
- Use extended/additional table approach – another way to implement vertical partitioning
- Logical file grouping will be used for better performance
Solution Strategy
Data Ingestion Layer
- •Pyspark based API
- •Apache Spark Data Frames
- •Map Reduced Capability
Data Processing Layer
- •Spark SQL Optimization Framework
- •ETL Transformation
- •Various Catalog based approaches – Parsed Plan, Analysed Plan, Optimized Plan
Data Storage Layer for Analytics
- •RDD Store
- •Decentralized Object with logical partitioned
- •API based data dissemination