Xiaodong Yu (xyu38)

Xiaodong Yu

Assistant Professor

Charles V. Schaefer, Jr. School of Engineering and Science

Department of Computer Science

Education

  • PhD (2019) Virginia Tech (Computer Science)

Research

Parallel and Distributed Computing and Systems, Next-Generation AI Hardware, High-Performance MLSys (supporting LLM, GNN, and more), Communication and Privacy in Federated Learning

General Information

Xiaodong Yu is an Assistant Professor in the Department of Computer Science at Stevens Institute of Technology since 2023, where he leads the Advanced Parallel and distributEd Computing and Systems (APECS) lab. Prior to joining Stevens, he was an Assistant Computer Scientist in the Mathematics and Computer Science (MCS) Division at Argonne National Laboratory from 2019 to 2023. He also held an appointment as a Scientist-at-Large with the Consortium for Advanced Science and Engineering (CASE) at the University of Chicago. He earned his Ph.D. in Computer Science from Virginia Tech in 2019. His research interests span parallel and distributed algorithms, systems, and architectures. His work has resulted in over 50 peer-reviewed publications in top-tier HPC venues such as HPDC, ICS, and SC.
Dr. Yu is the PI of an NSF CRII award and an Argonne Laboratory Directed Research and Development (LDRD) project. He also served as the technical lead on several U.S. Department of Energy (DOE) projects. He has supervised more than 10 Ph.D. and undergraduate research interns at Argonne and currently advises five Ph.D. students at Stevens. He has been actively serving on the organizing and technical program committees of leading conferences, including ICS, SC, and IPDPS, and is a review board member for IEEE Transactions on Parallel and Distributed Systems (TPDS).

Experience

Stevens Institute of Technology, Hoboken, NJ
Assistant Professor 2023 - Current

Argonne National Laboratory, Lemont, IL
Guest Faculty 2024 - Current
Assistant Computer Scientist 2019 - 2023

The University of Chicago Consortium for Advanced Science and Engineering, Chicago, IL
Scientist-at-Large 2022 - 2023

AMD, Austin, TX
Software Engineer (Intern) Summer 2017

Institutional Service

  • CS Tenure-Track Faculty Search Committee Member

Professional Service

  • The 10th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD 2024) in conjunction with ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) Technical Program Committee Member
  • IEEE Transactions on Parallel and Distributed Systems (TPDS) Reviewer
  • IEEE Transactions on Parallel and Distributed Systems (TPDS) Review Board Member
  • The Fourth International Workshop on Big Data Reduction (IWBDR-4) in conjunction with 2023 IEEE International Conference on Big Data (IEEE BigData) Technical Program Committee Member
  • Future Generation Computer Systems (FGCS) - Elsevier Reviewer
  • The first Workshop on Software and Hardware Co-Design of Deep Learning Systems in Accelerators (SHDA 2023) in conjunction with ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) Technical Program Committee Member
  • IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Finance Chair

Appointments

Assistant Professor, Department of Computer Science, Stevens Institute of Technology

Professional Societies

  • ASEE – American Society for Engineering Education Member
  • ACM – Association for Computing Machinery Member
  • IEEE – Institute of Electrical and Electronics Engineers Member

Grants, Contracts and Funds

Sole PI: NSF, CRII: OAC: A Compressor-Assisted Collective Communication Framework for GPU-Based Large-Scale Deep Learning 2024 – 2026
Site PI: DOE/ANL, Scalable and Resilient Modeling for Federated-Learning-Based Complex Workflows 2024 – 2025
Sole PI: Stevens SIAI, Efficient Communication Payload Reductions for Federated Learning 2024
Lead PI: ANL LDRD, Scalability Study of AI-based Surrogate for Ptychographic Image Reconstruction on Graphcore 2022

Selected Publications

Conference Proceeding

  1. Shah, M.; Yu, X.; Di, S.; Becchi, M.; Cappello, F.; Dazzi, P.; Mencagli, G.; Lowenthal, D. K.; Badia, R. M. (2024). A Portable, Fast, DCT-based Compressor for AI Accelerators. Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2024, Pisa, Italy, June 3-7, 2024 (pp. 109--121). ACM.
    https://doi.org/10.1145/3625549.3658662.
  2. Huang, J.; Di, S.; Yu, X.; Zhai, Y.; Zhang, Z.; Liu, J.; Lu, X.; Raffenetti, K.; Zhou, H.; Zhao, K.; Chen, Z.; Cappello, F.; Guo, Y.; Thakur, R. (2024). An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression. IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024 (pp. 752--764). IEEE.
    https://doi.org/10.1109/IPDPS57955.2024.00072.
  3. Xie, Z.; Emani, M.; Yu, X.; Tao, D.; He, X.; Su, P.; Zhou, K.; Vishwanath, V.; Bagchi, S.; Zhang, Y. (2024). Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor. Proceedings of the 2024 USENIX Annual Technical Conference, USENIX ATC 2024, Santa Clara, CA, USA, July 10-12, 2024 (pp. 1203--1221). USENIX Association.
    https://www.usenix.org/conference/atc24/presentation/xie.
  4. Song, S.; Huang, Y.; Jiang, P.; Yu, X.; Zheng, W.; Di, S.; Cao, Q.; Feng, Y.; Xie, Z.; Cappello, F.; Dazzi, P.; Mencagli, G.; Lowenthal, D. K.; Badia, R. M. (2024). CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2. Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2024, Pisa, Italy, June 3-7, 2024 (pp. 309--321). ACM.
    https://doi.org/10.1145/3625549.3658691.
  5. Huang, J.; Di, S.; Yu, X.; Zhai, Y.; Liu, J.; Huang, Y.; Raffenetti, K.; Zhou, H.; Zhao, K.; Lu, X.; Chen, Z.; Cappello, F.; Guo, Y.; Thakur, R.; Kise, K.; Salapura, V.; Annavaram, M.; Varbanescu, A. L. (2024). gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters. Proceedings of the 38th ACM International Conference on Supercomputing, ICS 2024, Kyoto, Japan, June 4-7, 2024 (pp. 437--448). ACM.
    https://doi.org/10.1145/3650200.3656636.
  6. Huang, J.; Di, S.; Yu, X.; Zhai, Y.; Liu, J.; Huang, Y.; Raffenetti, K.; Zhou, H.; Zhao, K.; Chen, Z.; Cappello, F.; Guo, Y.; Thakur, R.; Steuwer, M.; Lee, I. A.; Chabbi, M. (2024). POSTER: Optimizing Collective Communications with Error-bounded Lossy Compression for GPU Clusters. Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2024, Edinburgh, United Kingdom, March 2-6, 2024 (pp. 454--456). ACM.
    https://doi.org/10.1145/3627535.3638467.
  7. Zhang, C.; Sun, B.; Yu, X.; Xie, Z.; Zheng, W.; Iskra, K. A.; Beckman, P.; Tao, D. (2023). Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors. Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, SC-W 2023, Denver, CO, USA, November 12-17, 2023 (pp. 1757--1766). ACM.
    https://doi.org/10.1145/3624062.3624257.
  8. Huang, Y.; Di, S.; Yu, X.; Li, G.; Cappello, F.; Arnold, D.; Badia, R. M.; Mohror, K. M. (2023). cuSZp: An Ultra-fast GPU Error-bounded Lossy Compression Framework with Optimized End-to-End Performance. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023, Denver, CO, USA, November 12-17, 2023 (pp. 43:1--43:13). ACM.
    https://doi.org/10.1145/3581784.3607048.
  9. Zhang, B.; Tian, J.; Di, S.; Yu, X.; Feng, Y.; Liang, X.; Tao, D.; Cappello, F.; Butt, A. R.; Mi, N.; Chard, K. (2023). FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs. Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2023, Orlando, FL, USA, June 16-23, 2023 (pp. 129--142). ACM.
    https://doi.org/10.1145/3588195.3592994.
  10. Shah, M.; Yu, X.; Di, S.; Lykov, D.; Alexeev, Y.; Becchi, M.; Cappello, F. (2023). GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations. IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023, St. Petersburg, FL, USA, May 15-19, 2023 (pp. 757--767). IEEE.
    https://doi.org/10.1109/IPDPS54959.2023.00081.
  11. Zhang, B.; Tian, J.; Di, S.; Yu, X.; Swany, M.; Tao, D.; Cappello, F.; Gallivan, K. A.; Gallopoulos, E.; Nikolopoulos, D. S.; Ram\'on Beivide (2023). GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs. Proceedings of the 37th International Conference on Supercomputing, ICS 2023, Orlando, FL, USA, June 21-23, 2023 (pp. 348--359). ACM.
    https://doi.org/10.1145/3577193.3593706.
  12. Zhang, C.; Smith, S.; Sun, B.; Tian, J.; Soifer, J.; Yu, X.; Song, S. L.; He, Y.; Tao, D.; Gallivan, K. A.; Gallopoulos, E.; Nikolopoulos, D. S.; Ram\'on Beivide (2023). HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs. Proceedings of the 37th International Conference on Supercomputing, ICS 2023, Orlando, FL, USA, June 21-23, 2023 (pp. 324--335). ACM.
    https://doi.org/10.1145/3577193.3593717.
  13. Shah, M.; Yu, X.; Di, S.; Becchi, M.; Cappello, F.; Gallivan, K. A.; Gallopoulos, E.; Nikolopoulos, D. S.; Ram\'on Beivide (2023). Lightweight Huffman Coding for Efficient GPU Compression. Proceedings of the 37th International Conference on Supercomputing, ICS 2023, Orlando, FL, USA, June 21-23, 2023 (pp. 99--110). ACM.
    https://doi.org/10.1145/3577193.3593736.
  14. Rivera, C.; Di, S.; Tian, J.; Yu, X.; Tao, D.; Cappello, F. (2022). Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs. 2022 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2022, Lyon, France, May 30 - June 3, 2022 (pp. 717--727). IEEE.
    https://doi.org/10.1109/IPDPS53621.2022.00075.
  15. Yu, X.; Di, S.; Zhao, K.; Tian, J.; Tao, D.; Liang, X.; Cappello, F.; Weissman, J. B.; Chandra, A.; Gavrilovska, A.; Tiwari, D. (2022). Ultrafast Error-bounded Lossy Compression for Scientific Datasets. HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022 - 1 July 2022 (pp. 159--171). ACM.
    https://doi.org/10.1145/3502181.3531473.
  16. Yu, X.; Di, S.; Gok, A. M.; Tao, D.; Cappello, F. (2021). cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions. IEEE International Conference on Cluster Computing, CLUSTER 2021, Portland, OR, USA, September 7-10, 2021 (pp. 307--319). IEEE.
    https://doi.org/10.1109/Cluster48925.2021.00065.
  17. Bicer, T.; Yu, X.; Ching, D. J.; Chard, R.; Cherukara, M. J.; Nicolae, B.; Kettimuthu, R.; Foster, I. T.; Nichols, J.; Maccabe, A. B.; Nutaro, J. J.; Pophale, S.; Devineni, P.; Ahearn, T.; Verastegui, B. (2021). High-Performance Ptychographic Reconstruction with Federated Facilities. Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation - 21st Smoky Mountains Computational Sciences and Engineering, SMC 2021, Virtual Event, October 18-20, 2021, Revised Selected Papers (vol. 1512, pp. 173--189). Springer.
    https://doi.org/10.1007/978-3-030-96498-6/_10.
  18. Tian, J.; Di, S.; Yu, X.; Rivera, C.; Zhao, K.; Jin, S.; Feng, Y.; Liang, X.; Tao, D.; Cappello, F. (2021). Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs. IEEE International Conference on Cluster Computing, CLUSTER 2021, Portland, OR, USA, September 7-10, 2021 (pp. 283--293). IEEE.
    https://doi.org/10.1109/Cluster48925.2021.00047.
  19. Yu, X.; Bicer, T.; Kettimuthu, R.; Foster, I. T.; Zhou, H.; Moreira, J.; Mueller, F.; Etsion, Y. (2021). Topology-aware optimizations for multi-GPU ptychographic image reconstruction. ICS '21: 2021 International Conference on Supercomputing, Virtual Event, USA, June 14-17, 2021 (pp. 354--366). ACM.
    https://doi.org/10.1145/3447818.3460380.
  20. Yu, X.; Wei, F.; Ou, X.; Becchi, M.; Bicer, T.; Yao, D. D. (2020). GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting. 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA, USA, May 18-22, 2020 (pp. 274--284). IEEE.
    https://doi.org/10.1109/IPDPS47924.2020.00037.
  21. Yu, X.; Xiao, Y.; Cameron, K. W.; Danfeng (Daphne) Yao; Jansen, R.; Peterson, P. A. (2019). Comparative Measurement of Cache Configurations' Impacts on Cache Timing Side-Channel Attacks. 12th USENIX Workshop on Cyber Security Experimentation and Test, CSET 2019, Santa Clara, CA, USA, August 12, 2019. USENIX Association.
    https://www.usenix.org/conference/cset19/presentation/yu.
  22. Lux, T. C.; Watson, L. T.; Chang, T. H.; Bernard, J.; Li, B.; Yu, X.; Li Xu; Back, G.; Butt, A. R.; Cameron, K. W.; Yao, D.; Hong, Y.; Wong, K.; Shen, C.; Brown, D. (2018). Novel meshes for multivariate interpolation and approximation. Proceedings of the ACMSE 2018 Conference, Richmond, KY, USA, March 29-31, 2018 (pp. 13:1--13:7). ACM.
    https://doi.org/10.1145/3190645.3190687.
  23. Yu, X.; Hou, K.; Wang, H.; Feng, W. (2017). A framework for fast and fair evaluation of automata processing hardware. 2017 IEEE International Symposium on Workload Characterization, IISWC 2017, Seattle, WA, USA, October 1-3, 2017 (pp. 120--121). IEEE Computer Society.
    https://doi.org/10.1109/IISWC.2017.8167767.
  24. Yu, X.; Wang, H.; Feng, W.; Gong, H.; Cao, G. (2017). An Enhanced Image Reconstruction Tool for Computed Tomography on CPUs. Proceedings of the Computing Frontiers Conference, CF'17, Siena, Italy, May 15-17, 2017 (pp. 97--106). ACM.
    https://doi.org/10.1145/3075564.3078889.
  25. Nourian, M.; Wang, X.; Yu, X.; Feng, W.; Becchi, M.; Gropp, W. D.; Beckman, P.; Li, Z.; Cazorla, F. J. (2017). Demystifying automata processing: GPUs, FPGAs or Micron's AP?. Proceedings of the International Conference on Supercomputing, ICS 2017, Chicago, IL, USA, June 14-16, 2017 (pp. 1:1--1:11). ACM.
    https://doi.org/10.1145/3079079.3079100.
  26. Yu, X.; Hou, K.; Wang, H.; Feng, W.; Nie, J.; Obradovic, Z.; Suzumura, T.; Ghosh, R.; Nambiar, R.; Wang, C.; Zang, H.; Baeza-Yates, R.; Hu, X.; Kepner, J.; Cuzzocrea, A.; Tang, J.; Toyoda, M. (2017). Robotomata: A framework for approximate pattern matching of big data on an automata processor. 2017 IEEE International Conference on Big Data (IEEE BigData 2017), Boston, MA, USA, December 11-14, 2017 (pp. 283--292). IEEE Computer Society.
    https://doi.org/10.1109/BigData.2017.8257936.
  27. Yu, X.; Wang, H.; Feng, W.; Gong, H.; Cao, G. (2016). cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs. IEEE/ACM 16th International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2016, Cartagena, Colombia, May 16-19, 2016 (pp. 165--168). IEEE Computer Society.
    https://doi.org/10.1109/CCGrid.2016.96.
  28. Yu, X.; Feng, W.; Danfeng (Daphne) Yao; Becchi, M.; Crowley, P.; Rizzo, L.; Mathy, L. (2016). O3FA: A Scalable Finite Automata-based Pattern-Matching Engine for Out-of-Order Deep Packet Inspection. Proceedings of the 2016 Symposium on Architectures for Networking and Communications Systems, ANCS 2016, Santa Clara, CA, USA, March 17-18, 2016 (pp. 1--11). ACM.
    https://doi.org/10.1145/2881025.2881034.
  29. Yu, X.; Becchi, M.; Nicolau, A.; Shen, X.; Amarasinghe, S. P.; Vuduc, R. W. (2013). Exploring different automata representations for efficient regular expression matching on GPUs. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, Shenzhen, China, February 23-27, 2013 (pp. 287--288). ACM.
    https://doi.org/10.1145/2442516.2442548.
  30. Yu, X.; Becchi, M.; Franke, H.; Heinecke, A.; Palem, K. V.; Upfal, E. (2013). GPU acceleration of regular expression matching for large datasets: exploring the implementation space. Computing Frontiers Conference, CF'13, Ischia, Italy, May 14 - 16, 2013 (pp. 18:1--18:10). ACM.
    https://doi.org/10.1145/2482767.2482791.

Journal Article

  1. Di, S.; Liu, J.; Zhao, K.; Liang, X.; Underwood, R.; Zhang, Z.; Shah, M.; Huang, Y.; Huang, J.; Yu, X.; Ren, C.; Guo, H.; Wilkins, G.; Tao, D.; Tian, J.; Jin, S.; Jian, Z.; Wang, D.; Rahman, M. H.; Zhang, B.; Calhoun, J. C.; Li, G.; Yoshii, K.; Alharthi, K. A.; Cappello, F. (2024). A Survey on Error-Bounded Lossy Compression for Scientific Datasets. CoRR (vol. abs/2404.02840).
    https://doi.org/10.48550/arXiv.2404.02840.
  2. Huang, J.; Di, S.; Yu, X.; Zhai, Y.; Liu, J.; Raffenetti, K.; Zhou, H.; Zhao, K.; Chen, Z.; Cappello, F.; Guo, Y.; Thakur, R. (2023). C-Coll: Introducing Error-bounded Lossy Compression into MPI Collectives. CoRR (vol. abs/2304.03890).
    https://doi.org/10.48550/arXiv.2304.03890.
  3. Sun, B.; Yu, X.; Zhang, C.; Tian, J.; Jin, S.; Iskra, K.; Zhou, T.; Bicer, T.; Beckman, P.; Tao, D. (2022). SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates. CoRR (vol. abs/2211.00224).
    https://doi.org/10.48550/arXiv.2211.00224.
  4. Bicer, T.; Yu, X.; Ching, D. J.; Chard, R.; Cherukara, M. J.; Nicolae, B.; Kettimuthu, R.; Foster, I. T. (2021). High-Performance Ptychographic Reconstruction with Federated Facilities. CoRR (vol. abs/2111.11330).
    https://arxiv.org/abs/2111.11330.
  5. Yu, X.; Nikitin, V. V.; Ching, D. J.; Aslan, S. S.; Doga G\"ursoy; Bicer, T. (2021). Scalable and accurate multi-GPU based image reconstruction of large-scale ptychography data. CoRR (vol. abs/2106.07575).
    https://arxiv.org/abs/2106.07575.
  6. Yu, X.; Wang, H.; Feng, W.; Gong, H.; Cao, G. (2019). GPU-Based Iterative Medical CT Image Reconstructions. J. Signal Process. Syst. (3-4 ed., vol. 91, pp. 321--338).
    https://doi.org/10.1007/s11265-018-1352-0.
  7. Yu, X.; Lin, B.; Becchi, M. (2014). Revisiting State Blow-Up: Automatically Building Augmented-FA While Preserving Functional Equivalence. IEEE J. Sel. Areas Commun. (10 ed., vol. 32, pp. 1822--1833).
    https://doi.org/10.1109/JSAC.2014.2358840.

Courses

CS 382: Computer Architecture and Organization
CS/CpE 550: Computer Organization and Programming
CS 810: Special Topics in CS: Modern Parallel and Distributed Computing on Cluster and Super-Computers