Friday, 28 April 2023

HUDI

 Question on Hudi 

What is Copy on write and Merge on Read 

What is Serialization in Athena table Serde 

  • serde tells us format of input file. If we have json file and we load into hive table. serde tells it the underlying file is in json format. 

What happens if we remove or add a column in Glue 

- Materialized views and joining tables 

-- how hudi knows something is latest record 

-- Hudi Snapshot and Incremental API 

How data skews are handled in Spark 

  • SALT Technique in spark for Handling Skews - Adding a new field to join key 

-- How do we partition table in S3 or HUDI , what partition keys do we use. What are indexes in HUDI 


How Parquet stores data -


No comments:

Post a Comment