Data Pipelines with Apache Airflow
- 书名:Data Pipelines with Apache Airflow
- 作者: Bas Harenslak Julian de Ruiter
- 格式:PDF
- 时间:2024-06-28
- 评分:
- ISBN:9781617296901
Data Pipelines with Apache Airflow is your essential guide to working with the powerful Apache Airflow pipeline manager. Expert data engineers Bas Harenslak and Julian de Ruiter take you through best practices for creating pipelines for multiple tasks, including data lakes, cloud deployments, and data science. Part desktop reference, part hands-on tutorial, this book teaches you the ins-and-outs of the Directed Acyclic Graphs (DAGs) that power Airflow, and how to write your own DAGs to meet the needs of your projects. You’ll learn how to automate moving and transforming data, managing pipelines by backfilling historical tasks, developing custom components for your specific systems, and setting up Airflow in production environments. With complete coverage of both foundational and lesser-known features, when you’re done you’ll be set to start using Airflow for seamless data pipeline development and management.
what's inside
Framework foundation and best practices
Airflow's execution and dependency system
Testing Airflow DAGs
Running Airflow in production
Bas Harenslak and Julian de Ruiter are data engineers with extensive experience using Airflow to develop pipelines for major companies including Heineken, Unilever, and Booking.com. Bas is a committer, and both Bas and Julian are active contributors to Apache Airflow.
-
醉醉鱼♂06-24第一二章讲到了我们会用到的方方面面。比如,自定义的operator,将任务分配到外部资源而不是Airflow实例上,打包和测试,甚至容器化以便支持各种各样的任务。
-
3点一直线02-26这书还是不错的, 可以作为一本入门和了解的书籍。 主要是用airflow 1.10写的,也在2.0做了测试。 airflow 1和2 的区别还是很大的。 其实22年的时候 很多公司都推出了自己管理的airflow, 比如amazon 的mwaa, 就是在aws上的airflow。 我们现在用的2.4.3版本了。 这个书作为入门是可以的, 但是细节部分还是要自己在工作中去研究。 如果第一次接触airflow, 这书ok的
-
¥ifan05-16书中涉及到的技术决策讨论和实际经验蛮相符
-
2024-07-197
-
2024-07-195
-
2024-07-199
-
2024-07-197
-
2024-07-197
-
2024-07-198
-
2024-07-198
-
2024-07-196