Akuisisi Data Prediksi Curah Hujan Secara Periodik Menggunakan Apache Airflow
Main Article Content
Abstract
Akuisisi data, bertujuan untuk mengambil data awal, merupakan salah satu tahapan dalam metodologi penambangan data. Data awal akan diproses menjadi data akhir yang digunakan untuk proses pemodelan, seperti pembuatan model untuk memprediksi potensi terjadinya tanah longsor. Data prediksi curah hujan yang disediakan oleh Badan Meteorologi, Klimatologi, dan Geofisika (BMKG) dapat digunakan untuk pemodelan tersebut. Data akan disimpan di komputer lokal dengan menggunakan alat atau aplikasi otomasi yang bernama Apache Airflow. Proses akuisisi data dari server BMKG ke komputer lokal dijalankan secara otomatis dalam dua kali sehari, yaitu pada pukul 00.00 dan 12.00. Terdapat dua task yang dibuat di Directed Acyclic Graph (DAG) untuk proses ini, yaitu task pertama sebagai sensor ketersediaan data dan task kedua yang melakukan proses utama. Status dari DAG pada Apache Airflow juga dapat diketahui secara cepat, misalnya status telah berhasil, gagal, atau sedang berjalan. Apache Airflow juga menyediakan log yang dapat diakses untuk mengetahui alasan kegagalan suatu task. Hasil dari penelitian ini adalah terdapat pipeline pada aplikasi otomasi Apache Airflow untuk membantu proses akuisisi data secara periodik.
Article Details
Copyright Notice
Authors who publish with Journal of Informatics, Information System, Software Engineering and Applications (INISTA) agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
References
[2] S. T. McColl, “Chapter 2 - Landslide Causes and Triggers,” in Landslide Hazards, Risks, and Disasters, J. F. Shroder and T. Davies, Eds. Boston: Academic Press, 2015, pp. 17–42.
[3] R. M. Iverson, “Landslide triggering by rain infiltration,” Water Resour. Res., vol. 36, no. 7, pp. 1897–1910, 2000, doi: https://doi.org/10.1029/2000WR900090.
[4] G. Samodra, N. Ngadisih, M. Malawani, D. Mardiatno, A. Cahyadi, and F. S. Nugroho, “Frequency–magnitude of landslides affected by the 27–29 November 2017 Tropical Cyclone Cempaka in Pacitan, East Java,” J. Mt. Sci., vol. 17, pp. 773–786, 2020, doi: 10.1007/s11629-019-5734-y.
[5] M. Kotliar, A. V Kartashov, and A. Barski, “CWL-Airflow: a lightweight pipeline manager supporting Common Workflow Language,” Gigascience, vol. 8, no. 7, 2019, doi: 10.1093/gigascience/giz084.
[6] B. P. Harenslak and J. de Ruiter, Data Pipelines with Apache Airflow. Manning, 2021.
[7] “Apache Airflow.” https://airflow.apache.org/ (accessed Jan. 01, 2022).
[8] T. Koivisto, “Efficient Data Analysis Pipeline,” in Data Science for Natural Sciences Seminar, 2019, pp. 1–4.
[9] P. Chirupphapa, H. Esaki, and H. Ochiai, “INTAP: Integrated Network Traffic Analysis Pipeline for LAN Monitoring System,” in 2021 7th International Conference on Information Management (ICIM), 2021, pp. 92–96, doi: 10.1109/ICIM52229.2021.9417147.
[10] D. I. Gavrilov, A. A. Iachmenev, I. A. Matveev, D. A. Oleynik, and A. S. Petrosyan, “Usage of The JINR SSO Authentication and Authorization System with Distributed Data Processing Services,” in 9th International Conference “Distributed Computing and Grid Technologies in Science and Education” (GRID’2021), 2021, pp. 536–540.
[11] B. Ramanan, L. Drabeck, T. Woo, T. Cauble, and A. Rana, “~PB amp;J~ - Easy Automation of Data Science/Machine Learning Workflows,” in 2020 IEEE International Conference on Big Data (Big Data), 2020, pp. 361–371, doi: 10.1109/BigData50022.2020.9378128.
[12] A. Suleykin and P. Panfilov, “Implementing big data processing workflows using open source technologies,” in 30th DAAAM International Symposium on Intelligent Manufacturing and Automation, 2019, pp. 394–404, doi: 10.2507/30th.daaam.proceedings.054.
[13] L. Finnigan and E. Toner, “Building and Maintaining Metadata Aggregation Workflows Using Apache Airflow.” 2021, [Online]. Available: http://hdl.handle.net/20.500.12613/6955.
[14] S. Suganda, “How Does Tokopedia Take Airflow to the Next Level?,” 2020. https://medium.com/tokopedia-data/how-does-tokopedia-take-airflow-to-the-next-level-fa7dbda3be2b (accessed Jan. 01, 2022).
[15] R. Santamaria and H. Wang, “Apache Airflow at Apple - Multi-tenant Airflow and Custom Operators,” 2021. https://airflowsummit.org/sessions/2021/apache-airflow-at-apple/ (accessed Jan. 01, 2022).
[16] “Crontab,” The Open Group Base Specifications Issue 7 - IEEE Std 1003.1, 2018 edition, 2018. https://pubs.opengroup.org/onlinepubs/9699919799/utilities/crontab.html (accessed Jan. 02, 2022).
[17] S. Hoyer and J. Hamman, “xarray: N-D labeled arrays and datasets in Python,” J. Open Res. Softw., vol. 5, no. 1, 2017, doi: 10.5334/jors.148.
[18] Unidata, “Network Common Data Form (NetCDF).” 2015, [Online]. Available: http://doi.org/10.5065/D6H70CW6.