
Information 2025,16, 932 35 of 38
References
1.
Liu, C.; Pavlenko, A.; Interlandi, M.; Haynes, B. A Deep Dive into Common Open Formats for Analytical DBMSs. Proc. VLDB
Endow. 2023,16, 3044–3056. [CrossRef]
2.
Zeng, X.; Hui, Y.; Shen, J.; Pavlo, A.; McKinney, W.; Zhang, H. An Empirical Evaluation of Columnar Storage Formats. Proc.
VLDB Endow. 2023,17, 148–161. [CrossRef]
3.
Armbrust, M.; Das, T.; Sun, L.; Yavuz, B.; Zhu, S.; Murthy, M.; Torres, J.; van Hovell, H.; Ionescu, A.; Łuszczak, A.; et al. Delta
lake: High-performance ACID table storage over cloud object stores. Proc. VLDB Endow. 2020,13, 3411–3424. [CrossRef]
4.
Abadi, D.J.; Madden, S.R.; Hachem, N. Column-stores vs. row-stores: How different are they really? In Proceedings of the
2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 9–12 June 2008; pp. 967–980.
[CrossRef]
5.
Hai, R.; Koutras, C.; Quix, C.; Jarke, M. Data Lakes: A Survey of Functions and Systems. IEEE Trans. Knowl. Data Eng. 2023,
35, 12571–12590. [CrossRef]
6.
Gu, Z.; Corcoglioniti, F.; Lanti, D.; Mosca, A.; Xiao, G.; Xiong, J.; Calvanese, D. A systematic overview of data federation systems.
Semant. Web 2024,15, 107–165. [CrossRef]
7.
Sun, Y.; Meehan, T.; Schlussel, R.; Xie, W.; Basmanova, M.; Erling, O.; Rosa, A.; Fan, S.; Zhong, R.; Thirupathi, A.; et al. Presto: A
Decade of SQL Analytics at Meta. Proc. ACM Manag. Data 2023,1, 1–25. [CrossRef]
8.
Potharaju, R.; Kim, T.; Song, E.; Wu, W.; Novik, L.; Dave, A.; Acharya, V.; Dhody, G.; Li, J.; Ramanujam, S.; et al. Hyperspace: The
Indexing Subsystem of Azure Synapse. Proc. Vldb Endow. 2021,14, 3043–3055. [CrossRef]
9.
Dong, X.L.; Srivastava, D. Big Data Integration; Synthesis Lectures on Data Management; Springer Nature Switzerland AG: Cham,
Switzerland, 2015. [CrossRef]
10.
Okolnychyi, A.; Sun, C.; Tanimura, K.; Spitzer, R.; Blue, R.; Ho, S.; Gu, Y.; Lakkundi, V.; Tsai, D. Petabyte-Scale Row-Level
Operations in Data Lakehouses. Proc. VLDB Endow. 2024,17, 4159–4172. [CrossRef]
11.
Alma’aitah, W.Z.; Quraan, A.; AL-Aswadi, F.N.; Alkhawaldeh, R.S.; Alazab, M.; Awajan, A. Integration Approaches for
Heterogeneous Big Data: A Survey. Cybern. Inf. Technol. 2024,24, 3–20. [CrossRef]
12.
Pedreira, P.; Erling, O.; Basmanova, M.; Wilfong, K.; Sakka, L.; Pai, K.; He, W.; Chattopadhyay, B. Velox: Meta’s unified execution
engine. Proc. VLDB Endow. 2022,15, 3372–3384. [CrossRef]
13.
Schneider, J.; Gröger, C.; Lutsch, A.; Schwarz, H.; Mitschang, B. The Lakehouse: State of the Art on Concepts and Technologies.
SN Comput. Sci. 2024,5, 449. [CrossRef]
14.
Kaoudi, Z.; Quiané-Ruiz, J.A. Unified Data Analytics: State-of-the-Art and Open Problems. Proc. Vldb Endow. 2022,15, 3778–3781.
[CrossRef]
15.
Fan, M.; Han, X.; Fan, J.; Chai, C.; Tang, N.; Li, G.; Du, X. Cost-Effective In-Context Learning for Entity Resolution: A Design
Space Exploration. In Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, The
Netherlands, 13–16 May 2024; pp. 3696–3709. [CrossRef]
16.
Zhang, Z.; Zeng, W.; Tang, J.; Huang, H.; Zhao, X. Active in-context learning for cross-domain entity resolution. Inf. Fusion 2025,
117, 102816. [CrossRef]
17.
Taboada, M.; Martinez, D.; Arideh, M.; Mosquera, R. Ontology matching with Large Language Models and prioritized depth-first
search. Inf. Fusion 2025,123, 103254. [CrossRef]
18.
Babaei Giglou, H.; D’Souza, J.; Engel, F.; Auer, S. LLMs4OM: Matching Ontologies with Large Language Models. In Proceedings
of the Semantic Web: ESWC 2024 Satellite Events, Hersonissos, Greece, 26–30 May 2024; Meroño Peñuela, A., Corcho, O., Groth,
P., Simperl, E., Tamma, V., Nuzzolese, A.G., Poveda-Villalón, M., Sabou, M., Presutti, V., Celino, I., et al., Eds.; Springer: Cham,
Switzerland, 2025; pp. 25–35.
19.
Barbon Junior, S.; Ceravolo, P.; Groppe, S.; Jarrar, M.; Maghool, S.; Sèdes, F.; Sahri, S.; Van Keulen, M. Are Large Language
Models the New Interface for Data Pipelines? In Proceedings of the International Workshop on Big Data in Emergent Distributed
Environments, Santiago, Chile, 9–15 June 2024; [CrossRef]
20.
Alidu, A.; Ciavotta, M.; Paoli, F.D. LLM-Based DAG Creation for Data Enrichment Pipelines in SemT Framework. In Proceedings
of the Service-Oriented Computing—ICSOC 2024 Workshops: ASOCA, AI-PA, WESOACS, GAISS, LAIS, AI on Edge, RTSEMS,
SQS, SOCAISA, SOC4AI and Satellite Events, Tunis, Tunisia, 3–6 December 2024; Springer Nature: Singapore, 2025; pp. 131–143.
[CrossRef]
21. Rahm, E.; Bernstein, P.A. A Survey of Approaches to Automatic Schema Matching. VLDB J. 2001,10, 334–350. [CrossRef]
22. Bleiholder, J.; Naumann, F. Data Fusion. ACM Comput. Surv. 2008,41, 1–41. [CrossRef]
23.
Cheney, J.; Chiticariu, L.; Tan, W. Provenance in Databases: Why, How, and Where. Found. Trends Databases 2009,1, 379–474.
[CrossRef]
24. Euzenat, J.; Shvaiko, P. Ontology Matching, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2013. [CrossRef]
25.
Christen, P. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection; Springer:
Berlin/Heidelberg, Germany, 2012. [CrossRef]