Iceberg is an open table format that brings simplicity of SQL table making possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to work at the same time with the same tables. It is suitable for data lake architectures and ofers good reliability and performance, this are some of the requirements for ACID transactions. Using Apache Iceberg improves some of the main problems of Hive:
Easy and realiable Schema Evolution
There is no worry in making schema changes in terms of data loss. For example, we can add or rename columns as fast as it’s said or even change column’s type or move it’s position. When a change is made, it just changes the metadata so this means they are executed very quickly.
Change partitioning
Changing partition for new written data is very easily. After this change, we will have old and new partition scheme together. When a query is made, the WHERE clause just knows wether to look for data. Also, there is no need to generate partition values for rows in a table and consumers don’t need to know how the table is partitioned to make a query. This Apache Iceberg feature is know as ‘Hidden Partitioning’
git clone https://github.com/ivrore/apache-iceberg-minio-spark.git
