Akuisisi Data Prediksi Curah Hujan Secara Periodik Menggunakan Apache Airflow

Main Article Content

Erwin Eko Wahyudi
Muhammad Auzan
Andi Dharmawan
Danang Eko Nuryanto
Nanang Susyanto
Guruh Samodra
Danang Sri Hadmoko

Abstract

Akuisisi data, bertujuan untuk mengambil data awal, merupakan salah satu tahapan dalam metodologi penambangan data. Data awal akan diproses menjadi data akhir yang digunakan untuk proses pemodelan, seperti pembuatan model untuk memprediksi potensi terjadinya tanah longsor. Data prediksi curah hujan yang disediakan oleh Badan Meteorologi, Klimatologi, dan Geofisika (BMKG) dapat digunakan untuk pemodelan tersebut. Data akan disimpan di komputer lokal dengan menggunakan alat atau aplikasi otomasi yang bernama Apache Airflow. Proses akuisisi data dari server BMKG ke komputer lokal dijalankan secara otomatis dalam dua kali sehari, yaitu pada pukul 00.00 dan 12.00. Terdapat dua task yang dibuat di Directed Acyclic Graph (DAG) untuk proses ini, yaitu task pertama sebagai sensor ketersediaan data dan task kedua yang melakukan proses utama. Status dari DAG pada Apache Airflow juga dapat diketahui secara cepat, misalnya status telah berhasil, gagal, atau sedang berjalan. Apache Airflow juga menyediakan log yang dapat diakses untuk mengetahui alasan kegagalan suatu task. Hasil dari penelitian ini adalah terdapat pipeline pada aplikasi otomasi Apache Airflow untuk membantu proses akuisisi data secara periodik.

Article Details

How to Cite
Wahyudi, E. E., Auzan, M., Dharmawan, A., Nuryanto, D., Susyanto, N., Samodra, G., & Hadmoko, D. (2022). Akuisisi Data Prediksi Curah Hujan Secara Periodik Menggunakan Apache Airflow. Journal of Informatics Information System Software Engineering and Applications (INISTA), 4(2), 1-12. https://doi.org/10.20895/inista.v4i2.574
Section
Articles

References

[1] S. Huber, H. Wiemer, D. Schneider, and S. Ihlenfeldt, “DMME: Data mining methodology for engineering applications – a holistic extension to the CRISP-DM model,” Procedia CIRP, vol. 79, pp. 403–408, 2019, doi: https://doi.org/10.1016/j.procir.2019.02.106.
[2] S. T. McColl, “Chapter 2 - Landslide Causes and Triggers,” in Landslide Hazards, Risks, and Disasters, J. F. Shroder and T. Davies, Eds. Boston: Academic Press, 2015, pp. 17–42.
[3] R. M. Iverson, “Landslide triggering by rain infiltration,” Water Resour. Res., vol. 36, no. 7, pp. 1897–1910, 2000, doi: https://doi.org/10.1029/2000WR900090.
[4] G. Samodra, N. Ngadisih, M. Malawani, D. Mardiatno, A. Cahyadi, and F. S. Nugroho, “Frequency–magnitude of landslides affected by the 27–29 November 2017 Tropical Cyclone Cempaka in Pacitan, East Java,” J. Mt. Sci., vol. 17, pp. 773–786, 2020, doi: 10.1007/s11629-019-5734-y.
[5] M. Kotliar, A. V Kartashov, and A. Barski, “CWL-Airflow: a lightweight pipeline manager supporting Common Workflow Language,” Gigascience, vol. 8, no. 7, 2019, doi: 10.1093/gigascience/giz084.
[6] B. P. Harenslak and J. de Ruiter, Data Pipelines with Apache Airflow. Manning, 2021.
[7] “Apache Airflow.” https://airflow.apache.org/ (accessed Jan. 01, 2022).
[8] T. Koivisto, “Efficient Data Analysis Pipeline,” in Data Science for Natural Sciences Seminar, 2019, pp. 1–4.
[9] P. Chirupphapa, H. Esaki, and H. Ochiai, “INTAP: Integrated Network Traffic Analysis Pipeline for LAN Monitoring System,” in 2021 7th International Conference on Information Management (ICIM), 2021, pp. 92–96, doi: 10.1109/ICIM52229.2021.9417147.
[10] D. I. Gavrilov, A. A. Iachmenev, I. A. Matveev, D. A. Oleynik, and A. S. Petrosyan, “Usage of The JINR SSO Authentication and Authorization System with Distributed Data Processing Services,” in 9th International Conference “Distributed Computing and Grid Technologies in Science and Education” (GRID’2021), 2021, pp. 536–540.
[11] B. Ramanan, L. Drabeck, T. Woo, T. Cauble, and A. Rana, “~PB amp;J~ - Easy Automation of Data Science/Machine Learning Workflows,” in 2020 IEEE International Conference on Big Data (Big Data), 2020, pp. 361–371, doi: 10.1109/BigData50022.2020.9378128.
[12] A. Suleykin and P. Panfilov, “Implementing big data processing workflows using open source technologies,” in 30th DAAAM International Symposium on Intelligent Manufacturing and Automation, 2019, pp. 394–404, doi: 10.2507/30th.daaam.proceedings.054.
[13] L. Finnigan and E. Toner, “Building and Maintaining Metadata Aggregation Workflows Using Apache Airflow.” 2021, [Online]. Available: http://hdl.handle.net/20.500.12613/6955.
[14] S. Suganda, “How Does Tokopedia Take Airflow to the Next Level?,” 2020. https://medium.com/tokopedia-data/how-does-tokopedia-take-airflow-to-the-next-level-fa7dbda3be2b (accessed Jan. 01, 2022).
[15] R. Santamaria and H. Wang, “Apache Airflow at Apple - Multi-tenant Airflow and Custom Operators,” 2021. https://airflowsummit.org/sessions/2021/apache-airflow-at-apple/ (accessed Jan. 01, 2022).
[16] “Crontab,” The Open Group Base Specifications Issue 7 - IEEE Std 1003.1, 2018 edition, 2018. https://pubs.opengroup.org/onlinepubs/9699919799/utilities/crontab.html (accessed Jan. 02, 2022).
[17] S. Hoyer and J. Hamman, “xarray: N-D labeled arrays and datasets in Python,” J. Open Res. Softw., vol. 5, no. 1, 2017, doi: 10.5334/jors.148.
[18] Unidata, “Network Common Data Form (NetCDF).” 2015, [Online]. Available: http://doi.org/10.5065/D6H70CW6.