Laboratory_iceberg

Iceberg is an open table format that brings simplicity of SQL table making possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to work at the same time with the same tables. It is suitable for data lake architectures and ofers good reliability and performance, this are some of the requirements for ACID transactions. Using Apache Iceberg improves some of the main problems of Hive:

hive

Easy and realiable Schema Evolution

There is no worry in making schema changes in terms of data loss. For example, we can add or rename columns as fast as it’s said or even change column’s type or move it’s position. When a change is made, it just changes the metadata so this means they are executed very quickly.

hive

Change partitioning

Changing partition for new written data is very easily. After this change, we will have old and new partition scheme together. When a query is made, the WHERE clause just knows wether to look for data. Also, there is no need to generate partition values for rows in a table and consumers don’t need to know how the table is partitioned to make a query. This Apache Iceberg feature is know as ‘Hidden Partitioning’

git clone https://github.com/ivrore/apache-iceberg-minio-spark.git

Laboratory_iceberg

Easy and realiable Schema Evolution

Change partitioning

Hello world!

Leave a Reply Cancel reply

newarch

Easy and realiable Schema Evolution

Change partitioning

Similar Posts

Leave a Reply Cancel reply

newarch