The Game of Twenty Questions: Do You Know Where to Log?

Thursday May 4th, 12-1PM @ BA5205

Speaker: Xu Zhao

Title:
The Game of Twenty Questions: Do You Know Where to Log?

Abstract:
A production system’s printed logs are often the only source of runtime information available for postmortem debugging, performance profiling, security auditing, and user behavior analytics.  Therefore, the quality of this data is critically important. Recent work has attempted to enhance log quality by recording additional variable values, but logging statement placement, i.e., where to place a logging statement, which is the most challenging and fundamental problem for improving log quality, has not been adequately addressed so far. This position paper proposes an automated placement of logging statements by measuring the uncertainty of software that can be eliminated. Guided by ideas from information theory, authors describe a simple approach that automates logging statement placement. Preliminary results suggest that the algorithm can effectively cover, and further improve, existing logging statements placed by developers. It can compute an optimal log-placement that disambiguates the entire function call path with only 0.218% of
slowdown.

Bio:
Xu Zhao is a 2nd year PhD student at the University of Toronto, under the supervision of Prof. Ding Yuan. His research interests lie in the area of performance of distributed systems and failure diagnosis. His current work focuses on automated placement of logging statements and non-intrusive performance profiling for distributed systems.

Challenges and Solutions to Secure Internet Geolocation

Wednesday May 3rd , 12-1PM @ BA5205

Speaker: AbdlRahman Abdou

Title:
Challenges and Solutions to Secure Internet Geolocation

Abstract:
The number of security-sensitive location-aware services over the Internet continues to grow, such as location-aware authentication, location-aware access policies, fraud prevention, complying with media licensing, and regulating online gambling/voting. 
An adversary can evade existing geolocation techniques, e.g., by faking GPS coordinates or employing a non-local IP address through proxy and virtual private networks. In this talk, I will present parts of my PhD work, including Client Presence Verification (CPV), which is a measurement-based technique designed to verify an assertion about a device’s presence inside a prescribed geographic region. CPV does not identify devices by their IP addresses. Rather, the device’s location is corroborated in a novel way by leveraging geometric properties of triangles, which prevents an adversary from manipulating network delays to its favor. To achieve high accuracy, CPV mitigates Internet path asymmetry using a novel method to deduce one-way application-layer delays to/from the client’s participating device, and mines these delays for evidence supporting/refuting the asserted location. I will present CPV’s evaluation results, including the granularity of the verified location and the verification time, and summarize some lessons we learned throughout the process.

Bio:
AbdelRahman Abdou is a Post-Doctoral Fellow in the School of Computer Science at Carleton University. He received his PhD (2015) in Systems and Computer Engineering from Carleton University. His research interests include location-aware security, SDN security, authentication, SSL/TLS and using Internet measurements to solve problems related to Internet security.

Consistency Oracle

Friday April 28th, 1-2PM @ BA5205

Speaker: Beom Heyn Kim

Title:
Consistency Oracle

Abstract:
Many modern distributed storage systems emphasize availability and partition tolerance over consistency, leading to many systems that provide weak data consistency. However, weak data consistency is difficult for both system designers and users to reason about formal specifications that may offer precise descriptions of consistency behavior, but they are difficult to use and usually require expertise beyond that of the average software developer. In this paper, we propose and describe consistency oracle, a novel instantiation of formal specification. A consistency oracle takes the same interface call as a distributed storage system, but returns all possible values that may be returned under a given consistency model. Consistency oracles are easy to use and can be applied to test and verify both distributed storage systems and client software that uses those systems.

Bio:
Ben is a PhD student under Prof. David Lie. His research primarily focuses on Consistency Verification for Distributed Systems.

Semantic Aware Online Detection of Resource Anomalies on the Cloud

Wednesday Nov 23rd, 12-1PM @ BA5205

Speaker: Stelios Sotiriadis

Title:
Semantic Aware Online Detection of Resource Anomalies on the Cloud

Abstract:
As cloud based platforms become more popular, it becomes an essential task for the cloud administrator to efficiently manage the costly hardware resources in the cloud environment.
Prompt action should be taken whenever hardware resources are faulty, or configured and utilized in a way that causes application performance degradation, hence poor quality of service. In this paper, we propose a semantic aware technique based on neural network learning and pattern recognition in order to provide automated, real-time support for resource anomaly detection.
We incorporate application semantics to narrow down the scope of the learning and detection phase, thus enabling our machine learning technique to work at a very low overhead when executed online. As our method runs “life-long” on monitored resource usage on the cloud, in case of wrong prediction, we can leverage administrator feedback to improve prediction on future runs.
This feedback directed scheme with the attached context helps us to achieve an anomaly detection accuracy of as high as 98.3% in our experimental evaluation, and can be easily used in conjunction with other anomaly detection techniques for the cloud.

Bio:
Stelios Sotiriadis is a research fellow under Prof. Cristiana Amza. His research focuses Inter-Cloud Meta-Scheduling (ICMS) framework.

Non-intrusive Performance Profiling of Entire Software Stacks based on the Flow Reconstruction Principle

Thursday October 27th, 12-1PM @ BA5205

Speaker: Xu Zhao

Title:
Non-intrusive Performance Profiling of Entire Software Stacks based on the Flow Reconstruction Principle

Abstract:
Understanding the performance behavior of distributed server stacks at scale is non-trivial. The servicing of just a single request can trigger numerous sub-requests across heterogeneous software components; and many similar requests are serviced concurrently and in parallel. When a user experiences poor performance, it is extremely difficult to identify the root cause, as well as the software components and machines that are the culprits. This work describes Stitch, a non-intrusive tool capable of profiling the performance of an entire distributed software stack solely using the unstructured logs output by heterogeneous software components. Stitch is substantially different from all prior related tools in that it is capable of constructing a system model of an entire software stack without building any domain knowledge into Stitch. Instead, it automatically reconstructs the extensive domain knowledge of the programmers who wrote the code; it does this by relying on the Flow Reconstruction Principle which states that programmers log events such that one can reliably reconstruct the execution flow a posteriori.

Bio:
Xu is a second year PhD student under Prof. Ding Yuan. His research focuses on performance failure debugging and log analysis in distributed systems.

Don’t Get Caught In the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems

Wednesday October 26th, 12-1PM @ BA5205

Speaker: David Lion

Title:
Don’t Get Caught In the Cold, Warm-up Your JVM:
Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems

Abstract:

Many widely used, latency sensitive, data-parallel distributed systems, such as HDFS, Hive, and Spark choose to use the Java Virtual Machine (JVM), despite debate on the overhead of doing so. This paper analyzes the extent and causes of the JVM performance overhead in the above mentioned systems. Surprisingly, we find that the warm-up overhead, i.e., class loading and interpretation of bytecode, is frequently the bottleneck. For example, even an I/O intensive, 1GB read on HDFS spends 33% of its execution time in JVM warm-up, and Spark queries spend an average of 21 seconds in warm-up.

The findings on JVM warm-up overhead reveal a contradiction between the principle of parallelization, i.e., speeding up long running jobs by parallelizing them into short tasks, and amortizing JVM warm-up overhead through long tasks. We solve this problem by designing HotTub, a new JVM that amortizes the warm-up overhead over the lifetime of a cluster node instead of over a single job by reusing a pool of already warm JVMs across multiple applications. The speed-up is significant. For example, using HotTub results in up to 1.8X speed-ups for Spark queries, despite not adhering to the JVM specification in edge cases.

Bio:
David is a first year PhD student under Prof. Ding Yuan. His research primarily focuses on Java Virtual Machine performance in data-parallel applications.

Accelerating Complex Data Transfer for Cluster Computing

Friday June 10th, 12-1PM @ BA5205

Speaker: Alexey

Title:
Accelerating Complex Data Transfer for Cluster Computing

Abstract:
The ability to move data quickly between the nodes of a distributed system is important for the performance of cluster computing frameworks, such as Hadoop and Spark. We show that in a cluster with modern networking technology data serialization is the main bottleneck and source of overhead in the transfer of rich data in systems based on high-level programming languages such as Java. We propose a new data transfer mechanism that avoids serialization altogether by using a shared cluster-wide address space to store data. The design and a prototype implementation of this approach are described. We show that our mechanism is significantly faster than serialized data transfer, and propose a number of possible applications for it.

Bio:
Alexey Khrabrov is a 1st year PhD student at University of Toronto, under the supervision of prof. Eyal de Lara. His research interests lie in the area of performance of distributed systems. His current work focuses on leveraging modern network technologies and designing new programming models to improve data transfer performance in cluster computing systems.