Security and Performance Analytics (Big Data) Projects:
Designing and Building Scalable Computer Security Rules Suggestion System Designed and implemented a computer security rules suggestion system first at POC scale and then in production scale. System tested using one month data of all VMware Carbonblack Security customers. Wrote a patent application for efficient path tree algorithms that first group rule paths and then match them to absolute process paths. Algorithms are used in the computer security rules suggestion system. Status: Design, implementation and scale testing completed. Waiting to be deployed after integration with the UI that is also built. Final deployment testing and tuning to be done. (November 2019 - Present) Software/tools:Python, AWS packages such as s3 and related Python libraries.
Performance Analytics of Large Scale Systems Researching, Designing and Implementing Algorithms for Performance Anomaly Detection and Root Cause Analysis of Large Scale Distributed Systems in Offline and Realtime Modes. Status 1: Designed and implemented multiple root cause metric identification, anomaly detection and group (physical/virtual device, time series groups, etc) clustering algorithms. Currently in the pro- cess of integrating algorithms into VMware vSAN performance analysis and other VMware products. (January 2016 - May 2018). Software/tools: Java, Python, Spark MLib, Weka, SciPy, scikit-learn. Status 2: Completed design and implementation of algorithms to compute VMware vSAN trace metric values on multiple cores and aggregate results on vSAN transaction/operation boundaries. Also completed integrating the distributed vSAN trace analyzer with Wavefront visualizer. Currently working on moving the Python multiprocessing code to Apache Spark to do trace computation on multiple hosts. See CV for more details. (May 2018 - December 2018). Software/tools:Python multiprocessing, Spark MLib, VMware Wavefront visualizer. Status 3: Helped create a single VM NSX analytics appliances with open source packages such as Spark, Zookeeper, Kafka, Druid and Postgres along with some networking application (such as micro-segmentation) software. Creating small prototypes to test various features of the appliance. We are currently working to create a cluster of NSX Analytics Appliances with more networking applications such as Anomaly Detection (December 2018 - December 2019). Software/tools:Java, Python, Zookeeper, Kafka, Druid, Spark Streaming, Postgres.
IP Intelligence Products (IP Geolocation, IP Reputation) Researching, Designing and Implementing IP Reputation (IPR) Scoring Mechanisms using existing and custom machine learning algorithms on multiple Big Data sources. Status: Newer versions of IPR scoring product released and continuously upgraded with more data sources and customized scoring algorithms (November 2014 - Fall 2015). Software/tools: Java, Hive, Hive UDF, Hadoop MapReduce, MongoDB, Spark, Spark MLib, Weka, Mahout.
Big Data Mining and Machine Learning with Networking and other Applications In this project we first address the problem of malicious domain/url detection by extracting multiple domain/url feature values from big UltraDNS, registry and other datasets. We then use such features with well known machine learning algorithms to detect malicious/anomalous domains. The UltraDNS based features are extracted from a Hadoop cluster using Hive queries. We also design efficient string similarity metric algorithms to extract domain/url name based features to be used with/without well known machine learning algorithms. To address the performance (numerical complexity) issues current well known machine learning algorithms have, this project also involves designing fast and efficient machine learning algorithm for multi-attribute big data analysis. Status: Experimented with various machine learning algorithms with Decision Trees and Random Forests giving the best classification accuracy. Designed various string similarity metrics and implemented (using C++) efficient algorithms to calculate them. (Spring 2013 - Fall 2015).
Live Visualization of Malicious Website/Domain Activities The goal of this project is to observe the malicious on-line activities in real-time using hourly Neustar UltraDNS datasets. Status: Tool designed, implemented and is operational in collaboration with another Neustar colleague and a CS/UIUC MSc student intern (Summer 2014 ).
AdAdvisor Efficiency Projects:
Networked Device Identity (NDI) Enhancing AdAdvisor (targeted advertising) Services Using Privacy Aware NDI/Device Fingerprinting Instead of Only Cookies Status: System designed and analyzed using Neustar real world big Hadoop Cluster datasets. Initial prototype built in collaboration with a CS/UIUC undergraduate student intern (Summer 2014)
Communication Networks (Cloud, SDN, CDN) Projects:
SCDA: SLA-aware Cloud Datacenter Architecture for Efficient Content Storage and Retrieval This project proposes and analyses SCDA ( SCDA ), an efficient server selection, resource allocation and enforcement mechanism with many salient features. SCDA has prioritized rate allocation mechanism to satisfy different service level agreements (SLA)s on throughput and content transfer times (delays). The allocation scheme can achieve max/min fairness. SCDA has a mechanism to detect and hence mitigate SLA violation in realtime. SCDA is comprised of a light weight front end-server which distributes requests among many name nodes and efficient resource allocation schemes which serve as cross-layer routing and congestion control in the cloud. SCDA is a refinement of our older versions ( EDFS , scalDistFS ). SCDA utilizes scalable and distributed/hierarchical software components called resource monitors and resource allocators to achieve its goals. Status: System designed, implemented in the NS2 network simulator (C++ and oTCL based) and analyzed using extensive trace-based experiments.
Private/Personal Clouds with Incentivized, Prioritized and Efficient Content Routing This project designs and analyses a quick content distribution protocol ( Hincent , ExtendedHincent ), which uses efficient prioritized rate allocation and content selection algorithms offering high incentives to participating peers. The fair incentives attract more peers which securely download and distribute contents. This in turn can benefit content providers and network operators. The rate allocations of the protocol result in quicker content transfer time when compared with existing schemes. The protocol also employs effective rate enforcement mechanisms without requiring changes to the TCP/IP stack or to existing routers. Unlike existing centralized schemes such as YouTube, the design allows peers to have full control of (their) contents while sharing them with others using personal web servers. To do this the system uses efficient and scalable content index manager. We have also presented an extension of Hincent using surrogate servers with OpenFlow vSwitches to help peers exchange contents faster than using existing schemes. Status: System designed, implemented in the NS2 network simulator (C++ and oTCL based) and analyzed using extensive trace-based experiments. Initial prototype built in collaboration with a CS/UIUC MSc student intern (Summer 2014 ).
Optimizing CDNs using Efficient Route Computation Engine and Leveraging SDNs This project involves the design and analysis of a system which aims at optimizing networks to achieve realtime SLA and QoS guarantees at scale. The design which leverages SDNs also inherently helps mitigate DDoS attacks among other things. Status: System designed and analyzed. Initial prototype built in collaboration with a CS/UIUC PhD student intern (Summer 2014 ).
Past (Less Active) Projects:
XDI Personal Cloud Content Index The goal of this project is to build a personal cloud content index application using the XDI personal cloud language. Status: Application designed and initial prototype built in collaboration with other Neustar colleagues and a CS/UIUC undergrad student intern (Summer 2014 ).
QCP: Finishing Flows Faster with A Quick congestion Control Protocol (QCP) In this project we present the design and analysis of QCP, a Quick congestion Control Protocol ( QCP ). QCP can quickly give flows their fair share rates hence allow them to quickly finish. Unlike existing schemes, QCP uses an accurate formula to calculate the number of flows sharing a network link. This enables QCP to get fair share rates to flows without over or under-utilization of bottleneck link capacities. We also present an efficient sharing mechanism which QCP uses to assign capacity which is not used by some flows bottlenecked elsewhere to other flows which need the capacity. This makes QCP a max/min protocol. We show how QCP can be implemented by extending the emerging OpenFlow architecture. QCP is a refinement of earlier versions ( FCP , NCP ).
Cross-Layer Routing and Congestion Control Architectures In this project we use the QCP ( QCP ) rate as a link weight metric to find the path with the highest bottleneck link rate ( QCP , CrossLayer , BRTP ) using a modified Dijkstra algorithm. Such highest throughput path is used to route packets. The bottleneck rate obtained by the scheme is used as the sending rate of sources (hence cross-layer).
Mitigation of DoS and DDoS This project involves analysing (using simulation) PAS : A Packet Accounting System to Limit the Effects of DoS and DDoS. PAS is based on the idea that if every packet is accounted or paid for, then the DoS and DDoS problem reduces into a congestion control and fairness problem. It can then be dealt with by finding better routes or adjusting the sending rates of the flows sharing bottleneck resources.
Emulation of Simulation It is a known fact that simulation takes a long time to give the desired result. So this project aims at emulating simulation using some clever analytical techniques for example ( EmulModel1 ) without compromising the achievements of real simulation. Both repeating and terminating simulations can be successfully emulated. The emulation will be much faster than real simulation and hence will be robust to complex scenarios which are otherwise hard to analyse using real simulation.
Cloudlets and Interactive Mobile Cloud Applications In this project ( Cloudlets , ExtendedCloudlets ) we studied the impact of cloudlets in interactive mobile cloud applications. To study the impact and feasibility of cloudlets we proposed the design of cloudlet network and service architectures. Our study focuses on file editing, video streaming, collaborative chatting and realtime gaming which are representative enterprise application scenarios. The design and study can apply to other applications. (Funded by Boeing Research and Technology)
Named Data Networking (NDN)-Based Conferencing Architecture This project focused on the design and analysis of SNC, a scalable NDN-based conferencing architecture. The system design was evaluated using NDN tools on virtual machines and also using simulation. (Funded by Huawei US R&D )
Performance Analysis of Publish-Subscribe Systems This project focused on analytical and simulation models to evaluate the performance of publish-subscribe like messaging systems. (Funded by Boeing Research and Technology)
Design and Analysis of Flow Prioritization for Efficient Wireless Interface Manager In this Qualcomm R&D project, a flow prioritization scheme at a device, with multiple wireless interfaces, first identifies its flows that share a bottleneck. It then obtains the flow target rates using a constrained rate allocation (CRA) scheme based on the desired and required rates of each flow. Finally the scheme enforces the allocation by adaptively setting the receive or/and send windows of each flow based on their target rates obtained from the CRA.
Virtual Battery This project was about Virtual Battery: An Energy Reserve Abstraction for Embedded Sensor Networks ( Virtual Battery ). (CS at UIUC Course Project)
Performance Analysis of Communication Networks These series of projects were about analytical models of network performance ( AnaModel1 and AnaModel2 , EmulModel1 ). (Funded by Telkom-Siemens Centre of Excellence in ATM and Broadband Networks)