Loading Events

CIS Seminar: “Fast and Effective Analytics for Big Multi-Dimensional Data”

March 21, 2022 at 3:30 PM - 4:30 PM
Details
Date: March 21, 2022
Time: 3:30 PM - 4:30 PM
  • Event Tags:
  • Organizer
    Computer and Information Science
    Phone: 215-898-8560
    Venue
    Zoom – Email CIS for link cherylh@cis.upenn.edu

    Google Map

    Today, automated processes, Internet‑of‑Things deployments, and Web and mobile applications generate an overwhelming amount of high‑dimensional data. Meanwhile, computational resources remain limited, and advances in machine learning (ML) create a pressing need to support increasingly expensive and complex analytical tasks. Unfortunately, traditional data management techniques offer limited support for high‑dimensional data, ML tasks, and adaptation to data properties, often resulting in reduced performance. Similarly, due to the difficulty of providing invariances to specific data distortions, applications often resort to inadequate ML methods, reducing their effectiveness.

    In my work, I ask how we can address the lack of task‑aware and data‑driven adaptations in data management and ML methods. Specifically, I will discuss three solutions for (i) data representations and (ii) computational methods using techniques to exploit similarities, shapes, densities, and distributions in data. Motivated by the ubiquity of high-dimensional time series, I will first present a similarity-preserving representation to minimize storage footprint and accelerate specific ML analytics for time-series data. Then, I will discuss a variance-aware quantization method for indexing high-dimensional data. Finally, I will present a method for anomaly detection in streaming data to account for distribution drifts. In all three examples, the proposed methods substantially improve performance and accuracy, demonstrating the benefit of designing task-aware and data-driven solutions for large-scale data science applications.