1 year ago
#68090
Samar Singh
Visualization over delta table residing in S3
We have ingested CDC data to S3 raw layer. This CDC data in JSON file and has DML records (delete, update etc). We used spark streaming with delta lake to de-dup S3- raw layer data and move to standard layer. Used table partitioning on certain column. I have 2 questions:
- Can we use indexing also in delta table (if supported ) to index on primary key apart from partition ( inside partition data indexed on primary key)
- What visualization tool and necessary infra (spark or presto as compute) we can use to analyze delta table in S3. What would be the best approach? Data volume is too high. Should we move delta table to RDBMS and use visualization on top of that (but this will incur cost)
apache-spark
amazon-s3
delta-lake
0 Answers
Your Answer