.01

ABOUT

PERSONAL DETAILS
dheeraj.kumar@ece.iitr.ac.in
genuine.dheeraj@gmail.com
Hello. I am a  Researcher Programmer Teacher Dreamer Traveller
I am passionate about research and teaching
Welcome to my Personal and Academic profile

INTRODUCTION

ABOUT ME

I am an Asst. Professor in department of Electronics & Communications Engineering at IIT Roorkee, Uttarakhand, India since April 2019. Previously, I worked as a Post-Doctoral Research Assistant at Purdue University, USA from May 2017 to April 2019 and as a Research Officer at RMIT University, Australia from Oct 2016 to March 2017. My educational qualifications include B.Tech-M.Tech dual degree (Electrical Engineering) from IIT Kanpur, India in 2010 and Ph.D. (Electrical & Electronic Engineering) from the University of Melbourne, Australia in 2017.

Research interests:

  1. Developing novel algorithms for clustering tendency assessment and subsequent clustering and anomaly detection for Big Data.
  2. Applying these novel algorithms to smart city generated big data to extract actionable knowledge.
  3. Enhancing veracity of the data collected from wireless sensor networks used for smart city applications by means of sensor drift detection and correction.
  4. Using novel data to solve classic problem, e.g., leveraging social-media data for emergency management, and to understan urban mobility using mobility service providers such as Uber.


.02

RESUME

  • EDUCATION
  • 2012
    2017
    Melbourne, Australia

    ELECTRICAL & ELECTRONIC ENGINEERING - Ph.D.

    THE UNIVERSITY OF MELBOURNE

    I completed my Ph.D. research at ISSNIP research lab under the supervision of Prof. Marimuthu Palaniswami. The title of my Ph.D. thesis is "Big data clustering for smart city applications."
  • 2005
    2010
    Kanpur, India

    ELECTRICAL ENGINEERING - B. Tech - M. Tech dual degree

    INDIAN INSTITUTE OF TECHNOLOGY KANPUR

    I did my M.Tech thesis titled "Soft fusion methods for multimodal speech applications" under the guidance of Prof. R. Hegde. For my B.Tech project I developed a voice interactive telephone directory for IITK under the guidance of Prof. S. Umesh.
  • ACADEMIC AND PROFESSIONAL POSITIONS
  • 2019
    Present
    Roorkee, India

    ASSISTANT PROFESSOR

    IIT ROORKEE

    I joined IIT Roorkee as an Assistant Professor in the Department of Electronics and Communications Engineering in April 2019. I am currently working on the problem of developing novel algorithms for clustering tendency assessment and subsequent clustering and anomaly detection, and their application to smart city generated big data to extract actionable knowledge.
  • 2017
    2019
    West Lafayette, IN., USA

    POST-DOCTORAL RESEARCH ASSISTANT

    PURDUE UNIVERSITY

    I worked as a post-doctoral researcher at UMNI lab under the supervision of Prof. Satish Ukkusuri. I investigated the impacts of mobility service providers such as Uber on the urban taxi market by collecting and analyzed the trajectory data for Uber drivers by crawling the Uber web URL. I also studied the impact of surge pricing on customers’ and drivers’ behavior.
    In another project, I developed techinques which could leverage social media data for better modelling of evacuation decisions for emergency events such as hurricanes. I experimented on hurricane Sandy and Matthew Twitter data for analyzing evacuation time and location of residents and their tweets to explore the causation of their evacuation related decisions
  • 2016
    2017
    Melbourne, Australia

    RESEARCH OFFICER

    RMIT UNIVERSITY

    I worked on the problem of "opinion spam detection" for online review sites under the guidance of Prof. Xiuzhen (Jenny) Zhang. I proposed to use inductive matrix completion scheme for detecting spammer groups of singleton reviewers, which are difficult to detect due to non-availability of spam indicator signals.
  • 2011
    2012
    Jaipur, India

    LECTURER

    The LNM INSTITUTE OF INFORMATION TECHNOLOGY

    I taught a course titled "Microprocessor and Interfaces" using Intel's 8085 architecture and assembly programming. I set up ATMEL MCU University centre at LNMIIT having facilities including AVR micro controllers and necessary interfaces. I was also a member of equipment procuring committee and counselling cell.
  • 2010
    2011
    Pune, India

    DESIGN ENGINEER

    APPLIED MICRO CIRCUITS INDIA PRIVATE LTD

    I worked on the verification of the SATA and TRACE blocks at block level and SoC level using OVM as the methodology and SystemVerilog as the programming language.
  • 2009
    2010
    Kanpur, India

    TEACHING ASSISTANT

    INDIAN INSTITUTE OF TECHNOLOGY KANPUR

    I was teaching assistant for the labs for Digital Electronics and Microprocessor Technology, which includes programming of 8085 micro controller and Introduction to Electronics, which involves basic circuit designing.
  • HONORS AND AWARDS
  • 2019
    2019
    Roorkee, India

    RAMANUJAN FELLOWSHIP

    SCIENCE AND ENGINEERING RESEARCH BOARD (SERB), GOVT. OF INDIA

    I was selected for the prestigious Ramanujan fellowship offered by Science and Engineering Research Board (SERB), Govt. of India
  • 2012
    2017
    Melbourne, Australia

    MIFRS & MIRS

    THE UNIVERSITY OF MELBOURNE

    I was awarded Melbourne International Fee Remission Scholarship (MIFRS) and Melbourne International Research Scholarship (MIRS) for my doctoral studies at The University of Melbourne
  • 2004
    2004
    Japiur, India

    CERTIFICATE OF MERIT

    CENTRAL BOARD OF SECONDARY EDUCATION (CBSE)

    I stood first in class XII securing 88% and was awarded the certificate of merit by C.B.S.E. for being among the top 0.1% of all the qualified students in physics in class XII.
.03

PUBLICATIONS

PUBLICATIONS LIST
27 SEPT 2017

Interpreting Cluster Structure in Waveform Data with
Visual Assessment and Dunn’s Index

Frontiers in Computational Intelligence - Springer

PP. 73–101, 2017.

This article examines the intimate relationship that exists between Dunn’s index, single linkage clustering, and iVAT.

Book Chapter S. Mahallati, J.C. Bezdek, D. Kumar,
M.R. Popovic, and T.A. Valiante.

Interpreting Cluster Structure in Waveform Data with Visual Assessment and Dunn's Index

S. Mahallati, J.C. Bezdek, D. Kumar, M.R. Popovic, and T.A. Valiante. Book Chapter

Dunn’s index was introduced in 1974 as a way to define and identify a “best” crisp partition on n objects represented by either unlabeled feature vectors or dissimilarity matrix data. This article examines the intimate relationship that exists between Dunn’s index, single linkage clustering, and a visual method called iVAT for estimating the number of clusters in the input data. The relationship of Dunn’s index to iVAT and single linkage in the labeled data case affords a means to better understand the utility of these three companion methods when data are crisply clustered in the unlabeled case (the real case). Numerical examples using simulated waveform data drawn from the field of neuroscience illustrate the natural compatibility of Dunn’s index with iVAT and single linkage. A second aim of this note is to study customizing the three methods by changing the distance measure from Euclidean distance to one that may be more appropriate for assessing the validity of crisp clusters of finite sets of waveform data. We present numerical examples that support our assertion that when used collectively, the three methods afford a useful approach to evaluation of crisp clusters in unlabeled waveform data.

09 Jul 2020

Enhancing demographic coverage of hurricane
evacuation behavior modeling using social media

Journal of Computational Science

vol. 45, pp. 101184, July 2020.

This paper developes techniques to provide an alternative (fast and voluntary) source of information for modeling evacuation behavior during emergencies.

Journal Paper D. Kumar and S. V. Ukkusuri.

Enhancing demographic coverage of hurricane evacuation behavior modeling using social media

D. Kumar and S. V. Ukkusuri. Journal Paper

Hurricane evacuation is a complex dynamic process and a better understanding of the factors which influence the evacuation behavior of the coastal residents could be helpful in planning a better evacuation policy. Traditionally, the various aspects of the household evacuation decisions have been determined by post-evacuation questionnaire surveys, however, these surveys have seen a deterioration in the quality of the data due to a gradual decrease in response rates in recent years, which may lead to non-response bias. Increased activity of users on social media, especially during emergencies, along with the geo-tagging of the posts, provides an opportunity to gain insights into user's decision-making process, as well as to gauge public opinion and activities using the social media data as a supplement to the traditional survey data. This paper leverages the geo-tagged Tweets posted in the New York City (NYC) and Jacksonville, FL in wake of Hurricane Sandy and Matthew respectively to understand the evacuation behavior of the Twitter users and compare them with that of the survey respondents. We design the Twitter user classification problem as a novel HMM modeling framework to classify them into one of the three categories: outside evacuation zone, evacuees, and non-evacuees. We compare the demographic composition (age, gender, and race/ethnicity) and spatial coverage of Twitter users with that of the survey respondents to highlight the complementary nature of the two data sources, which when combined give a representative sample of the population. We analyze the GPS coordinates of the tweets by evacuees to understand evacuation and return time and evacuation location patterns and compared them with survey respondents. The techniques presented in this paper provide an alternative (fast and voluntary) source of information for modeling evacuation behavior during emergencies, which is complementary in terms of demographics and spatial distribution as compared to the traditional surveys and could be useful for authorities to plan a better evacuation campaign to minimize the risk to the life of the residents of the emergency hit areas.

20 Apr 2020

Visual Approaches for Exploratory Data Analysis: A
Survey of the Visual Assessment of Clustering
Tendency (VAT) Family of Algorithms

IEEE Systems, Man, and Cybernetics Magazine

vol. 6, no. 2, pp. 10-48, April 2020.

This article presents a detailed and systematic survey of the VAT family of algorithms and their applications to help researchers understand structural details in their data.

Journal Paper D. Kumar and J. C. Bezdek.

Visual Approaches for Exploratory Data Analysis: A Survey of the Visual Assessment of Clustering Tendency (VAT) Family of Algorithms

D. Kumar and J. C. Bezdek. Journal Paper

Exploratory data analysis (EDA) using data clustering is extremely important for understanding the basic characteristics of a novel data set before developing complex statistical models and testing the various hypotheses. A preliminary step to clustering is deciding whether the data contain any clusters and, if so, how many clusters to seek. This is the clustering-tendency-assessment problem, which has not received much attention in the pattern-recognition literature. An important category of algorithms in this domain includes visual approaches, represented here by the visual assessment of tendency (VAT) algorithm, which reorders the pairwise dissimilarity matrix and then generates a reordered dissimilarity image (RDI) or cluster heat map that shows possible clusters in the data by dark blocks along the diagonal. Since its introduction in 2002, the VAT algorithm has been modified by many researchers to improve the quality of the RDI, making it applicable to various types of data sets, such as high-volume, time-series, high-dimensional, and streaming data, among others (collectively called the VAT family of algorithms). Various members of the VAT family have been applied to many applications, including image segmentation, urban mobility, transportation, speech processing, biomedical applications, social media, and Web data analytics, on a variety of real-life data sets with diverse characteristics and properties. We hope that this detailed and systematic survey of the VAT family of algorithms and their applications will help researchers choose a useful member of the VAT family to help them understand structural details in their data. This article includes pseudocode for a suite of 25 algorithms in the VAT family of models, and the MATLAB implementation of selected algorithms are available on GitHub.

19 Mar 2020

Visual Structural Assessment and Anomaly Detection
for High-Velocity Data Streams

IEEE Transactions on Cybernetics

This article proposes a new relative of the VAT model, which produces a record of structural evolution in the data stream.

Journal Paper P. Rathore, D. Kumar, J. C. Bezdek,
S. Rajasegarar, and M. Palaniswami.

Visual Structural Assessment and Anomaly Detection for High-Velocity Data Streams

P. Rathore, D. Kumar, J. C. Bezdek, S. Rajasegarar and M. Palaniswami. Journal Paper

The widespread use of Internet-of-Things (IoT) technologies, smartphones, and social media services generates huge amounts of data streaming at high velocity. Automatic interpretation of these rapidly arriving data streams is required for the timely detection of interesting events that usually emerge in the form of clusters. This article proposes a new relative of the visual assessment of the cluster tendency (VAT) model, which produces a record of structural evolution in the data stream by building a cluster heat map of the entire processing history in the stream. The existing VAT-based algorithms for streaming data, called inc-VAT/inc-iVAT and dec-VAT/dec-iVAT, are not suitable for high-velocity and high-volume streaming data because of high memory requirements and slower processing speed as the accumulated data increases. The scalable iVAT (siVAT) algorithm can handle big batch data, but for streaming data, it needs to be (re)applied everytime a new datapoint arrives, which is not feasible due to the associated computation complexities. To address this problem, we propose an incremental siVAT algorithm, called inc-siVAT, which deals with the streaming data in chunks. It first extracts a small size smart sample using an intelligent sampling scheme, called maximin random sampling (MMRS), then incrementally updates the smart sample points on the fly, using our novel incremental MMRS (inc-MMRS) algorithm, to reflect changes in the data stream after each chunk is processed, and finally, produces an incrementally built iVAT image of the updated smart sample, using the inc-VAT/inc-iVAT and dec-VAT/dec-iVAT algorithms. These images can be used to visualize the evolving cluster structure and for anomaly detection in streaming data. Our method is illustrated with one synthetic and four real datasets, two of which evolve significantly over time. Our numerical experiments demonstrate the algorithm's ability to successfully identify anomalies and visualize changing cluster structure in streaming data.

20 Feb 2020

Understanding the Operational Dynamics of Mobility Service Providers: A Case of Uber

ACM Transactions on Spatial Algorithms and Systems (TSAS)

vol. 6, no. 2, pp. 12:1-12:20, Feb. 2020.

This study collects and mine the trajectory data of online drivers who serve Uber to demystify how Uber drives their drivers.

Journal Paper X. Qian, D. Kumar, W. Zhang, and S. V. Ukkusuri.

Understanding the Operational Dynamics of Mobility Service Providers: A Case of Uber

X. Qian, D. Kumar, W. Zhang, and S. V. Ukkusuri. Journal Paper

The rise of mobility service providers (MSPs) is reforming the traditional taxi service (TTS) market. MSPs differ from TTS with the core idea of using technology to optimally match riders with drivers, features like ride-sharing and surge pricing, and are not entry-regulated. It is of great significance to understand how MSPs operate and how we can integrate them with TTS for efficient urban mobility. Unfortunately, little is known about MSPs due to limited data revealed by them. In this study, we collect and mine the trajectory data of online drivers who serve Uber (one of the largest MSP) to demystify how Uber drives their drivers. We analyze the trip patterns of different Uber services and reveal their market share, trip metrics, and the spatial distributions of trip origins and destinations. We explore how MSPs improve the driver-rider matching efficiency and empirically validate the enormous efficiency gap between TTS and MSPs. In the end, we debunk the surge price as an instrument to restore driver-rider balance theory and show that drivers choose to chase or avoid the high surge areas depending on various other factors such as traffic congestion, time and location, and availability of alternate travel options as well. The results of this article provide insightful knowledge about the supply side of MSPs and contribute to new ideas on improving TTS and regulating MSPs.

05 MAR 2019

A Scalable Framework for Trajectory Prediction

IEEE Transactions on Intelligent Transportation Systems (T-ITS)

Vol. 20, no. 10, pp. 3860-3874, Oct. 2019.

This article proposes a scalable clustering and Markov chain-based hybrid framework for short- and long-term trajectory prediction.

Journal Paper P. Rathore, D. Kumar, S. Rajasegarar,
M. S. Palaniswami, and J. C. Bezdek.

A Scalable Framework for Trajectory Prediction

P. Rathore, D. Kumar, S. Rajasegarar, M. S. Palaniswami, and J. C. Bezdek. Journal Paper

Trajectory prediction (TP) is of great importance for a wide range of location-based applications in intelligent transport systems, such as location-based advertising, route planning, traffic management, and early warning systems. In the last few years, the widespread use of GPS navigation systems and wireless communication technology enabled vehicles has resulted in huge volumes of trajectory data. The task of utilizing these data employing spatio-temporal techniques for TP in an efficient and accurate manner is an ongoing research problem. Existing TP approaches are limited to the short-term predictions. Moreover, they cannot handle a large volume of trajectory data for long-term prediction. To address these limitations, we propose a scalable clustering and Markov chain-based hybrid framework, called Traj-clusiVAT-based TP, for both short- and long-term TPs, which can handle a large number of overlapping trajectories in a dense road network. Traj-clusiVAT can also determine the number of clusters, which represent different movement behaviors in input trajectory data. In our experiments, we compare our proposed approach with a mixed Markov model-based scheme and a trajectory clustering, NETSCAN-based TP method for both short- and long-term TPs. We performed our experiments on two real, vehicle trajectory datasets, including a large-scale trajectory dataset consisting of 3.28 million trajectories obtained from 15 061 taxis in Singapore over a period of one month. The experimental results on two real trajectory datasets show that our proposed approach outperforms the existing approaches in terms of both short- and long-term prediction performances, based on the prediction accuracy and distance error (in km).

25 OCT 2018

Dealing with Inliers in Feature Vector Data

International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems (IJUFKS)

Vol. 26, no. 2, pp. 25-45, 2018.

This article presents three new approaches to the detection and removal of inliers, which degrade the ability of many algorithms to find clusters in numerical data.

Journal Paper D. Kumar, Z. Ghafoori, J. C. Bezdek,
C. Leckie, K. Ramamohanarao, and M., Palaniswami.

Dealing with Inliers in Feature Vector Data

D. Kumar, Z. Ghafoori, J. C. Bezdek, C. Leckie, K. Ramamohanarao, and M., Palaniswami. Journal Paper

Inliers (bridge points) between clusters degrade the ability of many algorithms to find clusters in numerical data. We present three new approaches to the detection and removal of inliers. Two approaches are based on Local Outlier Factor (LOF) scores. We also discuss using LOF scores for an isolation Nearest Neighbour Ensemble (iNNE) approach to inlier detection. The third approach uses MaxiMin (MM) sampling to remove both inliers and outliers. We compare the three approaches on a synthetic and two real-life datasets. The failure of single linkage clustering due to the existence of bridging points is used as a means for evaluating the relative effectiveness of the three methods. We also show how inliers can degrade the quality of images built by the improved Visual Assessment of Tendency (iVAT) algorithm, which provides a visual representation of potential single linkage clusters in the data.

02 AUG 2018

Fast and Scalable Big Data Trajectory Clustering for Understanding Urban Mobility

IEEE Transactions on Intelligent Transportation Systems (T-ITS)

Vol. 19, no. 11, pp. 3709-3722, Nov. 2018.

This article proposes a novel Dijkstra-based dynamic time warping distance measure, trajDTW, and a novel fast-clusiVAT algorithm that can suggest the number of clusters in a trajectory dataset.

Journal Paper D. Kumar, H. Wu, S. Rajasegarar, C. Leckie,
S. Krishnaswamy and M. Palaniswami.

Fast and Scalable Big Data Trajectory Clustering for Understanding Urban Mobility

D. Kumar, H. Wu, S. Rajasegarar, C. Leckie, S. Krishnaswamy and M. Palaniswami. Journal Paper

Clustering of large-scale vehicle trajectories is an important aspect for understanding urban traffic patterns, particularly for optimizing public transport routes and frequencies and improving the decisions made by authorities. Existing trajectory clustering schemes are not well suited to large numbers of trajectories in dense city road networks due to the difficulty in finding a representative distance measure between trajectories that can scale to very large datasets. In this paper, we propose a novel Dijkstra-based dynamic time warping distance measure, trajDTW between two trajectories, which is suitable for large numbers of overlapping trajectories in a dense road network as found in major cities around the world. We also propose a novel fast-clusiVAT algorithm that can suggest the number of clusters in a trajectory dataset and identify and visualize the trajectories belonging to each cluster. We conduct experiments on a large-scale taxi trajectory dataset consisting of 3.28 million trajectories obtained from the GPS traces of 15 061 taxis within Singapore over a period of one month. Our analysis finds 13 trajectory clusters spanning the major expressways of Singapore, each of which can be further divided into two sub-clusters based on the travel direction. For each cluster, we provide a time-based distribution of trajectories to yield insights into how urban mobility patterns change with the time of day. We compare the trajectory clusters obtained using our approach with those obtained using popular general and trajectory specific clustering frameworks: DBSCAN, OPTICS, NETSCAN, and NEAT. We demonstrate that the clusters obtained using our novel fast-clusiVAT framework are better than those obtained using other clustering schemes, evaluated based on two internal cluster validity measures: Dunn's and Silhouette indices. Moreover, our fast-clusiVAT algorithm achieves significant speedup over a comparable approach without loss of cluster quality.

31 MAY 2018

A Rapid Hybrid Clustering Algorithm for Large
Volumes of High Dimensional Data

IEEE Transactions on Knowledge and Data Engineering (TKDE)

Vol. 31, no. 4, pp. 641-654, Apr. 2019

This article proposes a algorithm to simultaneously overcome both the "curse of dimensionality" problem due to high dimensions and scalability problems due to large sample size.

Journal Paper P. Rathore, D. Kumar, J. C. Bezdek,
S. Rajasegarar and M. Palaniswami.

A Rapid Hybrid Clustering Algorithm for Large Volumes of High Dimensional Data

P. Rathore, D. Kumar, J. C. Bezdek, S. Rajasegarar and M. Palaniswami. Journal Paper

Clustering large volumes of high-dimensional data is a challenging task. Many clustering algorithms have been developed to address either handling datasets with a very large sample size or with a very high number of dimensions, but they are often impractical when the data is large in both aspects. To simultaneously overcome both the `curse of dimensionality' problem due to high dimensions and scalability problems due to large sample size, we propose a new fast clustering algorithm called FensiVAT. FensiVAT is a hybrid, ensemble-based clustering algorithm which uses fast data-space reduction and an intelligent sampling strategy. In addition to clustering, FensiVAT also provides visual evidence that is used to estimate the number of clusters (cluster tendency assessment) in the data. In our experiments, we compare FensiVAT with nine state-of-the-art approaches which are popular for large sample size or high-dimensional data clustering. Experimental results suggest that FensiVAT, which can cluster large volumes of high-dimensional datasets in a few seconds, is the fastest and most accurate method of the ones tested.

25 APR 2017

Maximum Entropy based Auto Drift Correction using
High and Low Precision Sensors

ACM Transactions on Sensor Networks (TOSN)

Vol. 13, no. 3, pp. 24:1-24:41, Apr. 2017.

This paper proposes a novel framework to automatically detect and correct the drifts by employing Bayesian Maximum Entropy (BME) and Kalman filtering (KF) techniques.

Journal Paper P. Rathore, D. Kumar, S. Rajasegarar and M. Palaniswami.

Maximum Entropy based Auto Drift Correction using High and Low Precision Sensors

P. Rathore, D. Kumar, S. Rajasegarar and M. Palaniswami. Journal Paper

With the advancement in the Internet of Things (IoT) technologies, variety of sensors including inexpensive, low-precision sensors with sufficient computing and communication capabilities are increasingly deployed for monitoring large geographical areas. One of the problems with the use of inexpensive sensors is that they often suffer from random or systematic errors such as drift. The sensor drift is the result of slow changes that occur in the measurement driven by aging, loss of calibration, and changes in the phenomena being monitored over a time period. These drifting sensors need to be calibrated automatically for continuous and reliable monitoring. Existing methods for drift detection and correction do not consider the measurement errors or uncertainties present in those inexpensive low-precision sensors, hence, resulting in unreliable drift estimates. In this article, we propose a novel framework to automatically detect and correct the drifts by employing Bayesian Maximum Entropy (BME) and Kalman filtering (KF) techniques. The BME method is a spatiotemporal estimation method that incorporates the measurement errors of low-precision sensors as interval quantities along with the high-precision sensor measurements in their computations. Our scheme can be implemented in a centralized as well as in a distributed manner to detect and correct the drift generated in the sensors. For the centralized scheme, we compare several Kriging-based estimation techniques in combination with KF, and show the superiority of our proposed BME-based method in detecting and correcting the drift. We also propose a multivariate BME framework for drift detection, in which multiple features can be used to improve the drift estimates. To demonstrate the applicability of our distributed approach on a real-world application scenario, we implemented our algorithm on each wireless sensor node in order to perform in-network drift detection. The evaluation on real IoT datasets gathered from an indoor and an outdoor deployments reveal the superiority of our method in correctly identifying and correcting the drifts that develop in the sensors, in real time, compared to the existing approaches in the literature.

16 JAN 2017

A visual-numeric approach to clustering and anomaly detection for trajectory data

The Visual Computer - Springer

Vol. 33, no. 3, pp. 265-281, 2017.

This paper proposes a novel application of Visual Assessment of Tendency (VAT) based hierarchical clustering algorithms for trajectory analysis.

Journal Paper D. Kumar, J.C. Bezdek, S. Rajasegarar,
C. Leckie, and M. Palaniswami.

A visual-numeric approach to clustering and anomaly detection for trajectory data

D. Kumar, J.C. Bezdek, S. Rajasegarar, C. Leckie, and M. Palaniswami. Journal Paper

This paper proposes a novel application of Visual Assessment of Tendency (VAT) based hierarchical clustering algorithms (VAT, iVAT, and clusiVAT) for trajectory analysis. We introduce a new clustering based anomaly detection framework named iVAT+ and clusiVAT+ and use it for trajectory anomaly detection. This approach is based on partitioning the VAT-generated Minimum Spanning Tree based on an efficient thresholding scheme. The trajectories are classified as normal or anomalous based on the number of paths in the clusters. On synthetic datasets with fixed and variable numbers of clusters and anomalies, we achieve 98 % classification accuracy. Our two-stage clusiVAT method is applied to 26,039 trajectories of vehicles and pedestrians from a parking lot scene from the real life MIT trajectories dataset. The first stage clusters the trajectories ignoring directionality. The second stage divides the clusters obtained from the first stage by considering trajectory direction. We show that our novel two-stage clusiVAT approach can produce natural and informative trajectory clusters on this real life dataset while finding representative anomalies.

23 DEC 2016

Adaptive Cluster Tendency Visualization and Anomaly Detection for Streaming Data

ACM Transactions on Knowledge Discovery from Data

Vol. 11, no. 2, pp. 24:1-24:40, Dec 2016.

This article develops and exemplifies two new relatives of the VAT and iVAT models to visualize evolving cluster structure in streaming data.

Journal Paper D. Kumar, J.C. Bezdek, S. Rajasegarar,
M. Palaniswami, C. Leckie, J. Chan, and J. Gubbi.

Adaptive Cluster Tendency Visualization and Anomaly Detection for Streaming Data

D. Kumar, J.C. Bezdek, S. Rajasegarar, M. Palaniswami, C. Leckie, J. Chan, and J. Gubbi. Journal Paper

The growth in pervasive network infrastructure called the Internet of Things (IoT) enables a wide range of physical objects and environments to be monitored in fine spatial and temporal detail. The detailed, dynamic data that are collected in large quantities from sensor devices provide the basis for a variety of applications. Automatic interpretation of these evolving large data is required for timely detection of interesting events. This article develops and exemplifies two new relatives of the visual assessment of tendency (VAT) and improved visual assessment of tendency (iVAT) models, which uses cluster heat maps to visualize structure in static datasets. One new model is initialized with a static VAT/iVAT image, and then incrementally (hence inc-VAT/inc-iVAT) updates the current minimal spanning tree (MST) used by VAT with an efficient edge insertion scheme. Similarly, dec-VAT/dec-iVAT efficiently removes a node from the current VAT MST. A sequence of inc-iVAT/dec-iVAT images can be used for (visual) anomaly detection in evolving data streams and for sliding window based cluster assessment for time series data. The method is illustrated with four real datasets (three of them being smart city IoT data). The evaluation demonstrates the algorithms’ ability to successfully isolate anomalies and visualize changing cluster structure in the streaming data.

29 OCT 2016

A Hybrid Approach to Clustering in Big Data

IEEE Transactions on Cybernetics

Vol. 46, no. 10, pp. 2372-2385, Oct. 2016.

This paper presents a new clusiVAT algorithm for big data clustering.

Journal Paper D. Kumar, J.C. Bezdek, M. Palaniswami,
S. Rajasegarar, C. Leckie, and T.C. Havens.

A Hybrid Approach to Clustering in Big Data

D. Kumar, J.C. Bezdek, M. Palaniswami, S. Rajasegarar, C. Leckie, and T.C. Havens. Journal Paper

Clustering of big data has received much attention recently. In this paper, we present a new clusiVAT algorithm and compare it with four other popular data clustering algorithms. Three of the four comparison methods are based on the well known, classical batch k-means model. Specifically, we use k-means, single pass k-means, online k-means, and clustering using representatives (CURE) for numerical comparisons. clusiVAT is based on sampling the data, imaging the reordered distance matrix to estimate the number of clusters in the data visually, clustering the samples using a relative of single linkage (SL), and then noniteratively extending the labels to the rest of the data-set using the nearest prototype rule. Previous work has established that clusiVAT produces true SL clusters in compact-separated data. We have performed experiments to show that k-means and its modified algorithms suffer from initialization issues that cause many failures. On the other hand, clusiVAT needs no initialization, and almost always finds partitions that accurately match ground truth labels in labeled data. CURE also finds SL type partitions but is much slower than the other four algorithms. In our experiments, clusiVAT proves to be the fastest and most accurate of the five algorithms; e.g., it recovers 97% of the ground truth labels in the real world KDD-99 cup data (4,292,637 samples in 41 dimensions) in 76 s.

28 MAY 2015

Geospatial Estimation-Based Auto Drift Correction in Wireless Sensor Networks

ACM Transactions on Sensor Networks (TOSN)

Vol. 11, no. 3, pp. 50:1–50:39, Apr. 2015.

This article address the drift and bias errors in the measurements of a wireless sensor network.

Journal Paper D. Kumar, S. Rajasegarar, and M. Palaniswami.

Geospatial Estimation-Based Auto Drift Correction in Wireless Sensor Networks

D. Kumar, S. Rajasegarar, and M. Palaniswami. Journal Paper

Wireless sensor networks are often deployed in large numbers, over a large geographical region, in order to monitor the phenomena of interest. Sensors used in the sensor networks often suffer from random or systematic errors such as drift and bias. Even if they are calibrated at the time of deployment, they tend to drift as time progresses. Consequently, the progressive manual calibration of such a large-scale sensor network becomes impossible in practice. In this article, we address this challenge by proposing a collaborative framework to automatically detect and correct the drift in order to keep the data collected from these networks reliable. We propose a novel scheme that uses geospatial estimation-based interpolation techniques on measurements from neighboring sensors to collaboratively predict the value of phenomenon being observed. The predicted values are then used iteratively to correct the sensor drift by means of a Kalman filter. Our scheme can be implemented in a centralized as well as distributed manner to detect and correct the drift generated in the sensors. For centralized implementation of our scheme, we compare several kriging- and nonkriging-based geospatial estimation techniques in combination with the Kalman filter, and show the superiority of the kriging-based methods in detecting and correcting the drift. To demonstrate the applicability of our distributed approach on a real world application scenario, we implement our algorithm on a network consisting of Wireless Sensor Network (WSN) hardware. We further evaluate single as well as multiple drifting sensor scenarios to show the effectiveness of our algorithm for detecting and correcting drift. Further, we address the issue of high power usage for data transmission among neighboring nodes leading to low network lifetime for the distributed approach by proposing two power saving schemes. Moreover, we compare our algorithm with a blind calibration scheme in the literature and demonstrate its superiority in detecting both linear and non-linear drifts.

15 MAR 2011

On the Soft Fusion of Probability Mass Functions for Multimodal Speech Processing

EURASIP Journal on Advances in Signal Processing

Vol. 2011, Article ID 294010, 2011.

This paper develops two soft belief functions for multimodal speech processing applications: speaker diarization and audio-visual speech recognition.

Journal Paper D. Kumar, P. Vimal, and R. Hegde.

On the Soft Fusion of Probability Mass Functions for Multimodal Speech Processing

D. Kumar, P. Vimal, R. Hegde. Journal Paper

Multimodal speech processing has been a subject of investigation to increase robustness of unimodal speech processing systems. Hard fusion of acoustic and visual speech is generally used for improving the accuracy of such systems. In this paper, we discuss the significance of two soft belief functions developed for multimodal speech processing. These soft belief functions are formulated on the basis of a confusion matrix of probability mass functions obtained jointly from both acoustic and visual speech features. The first soft belief function (BHT-SB) is formulated for binary hypothesis testing like problems in speech processing. This approach is extended to multiple hypothesis testing (MHT) like problems to formulate the second belief function (MHT-SB). The two soft belief functions, namely, BHT-SB and MHT-SB are applied to the speaker diarization and audio-visual speech recognition tasks, respectively. Experiments on speaker diarization are conducted on meeting speech data collected in a lab environment and also on the AMI meeting database. Audiovisual speech recognition experiments are conducted on the GRID audiovisual corpus. Experimental results are obtained for both multimodal speech processing tasks using the BHT-SB and the MHT-SB functions. The results indicate reasonable improvements when compared to unimodal (acoustic speech or visual speech alone) speech processing.

09 MAY 2017

Big data clustering for Smart City applications

The University of Melbourne

This thesis presents novel algorithms for clustering tendency assessment for the two aspects of big data: Volume and Velocity. These algorithms were used to extract knowledge from several smart city generated big datasets.

Thesis D. Kumar

Big data clustering for Smart City applications

D. Kumar Thesis

The Internet of Things (IoT) infrastructure for the creation of smart cities consists of internet connected sensors, devices and citizens. This IoT infrastructure generates an enormous amount of data in the form of city-scale physical measurements and public opinions, constituting big data. Smart cities aim to efficiently use this wealth of data to manage and solve the problems faced by modern cities for better decision making. However, interpretation of the massive amount of smart city generated big data to create actionable knowledge is a challenging task. Aggregation and Summarization (data clustering) is a useful tool to create knowledge from raw data from different sources. However, traditional data clustering algorithms are not suitable for unlabelled smart city data owing to its high volume and generation velocity and limited experience about generating phenomenon.
This thesis presents a novel framework for clustering tendency assessment for big data: clusiVAT, which provides an aggregated view of the big data to create actionable knowledge. clusiVAT intelligently selects a small number of samples from the data such that the samples retain the approximate geometry of the big dataset. The reordered dissimilarity image of the samples generated using single linkage minimum spanning tree (MST) suggests the number of clusters in the data, which is required as an input for most popular clustering algorithms. The cluster labels are then extended to the non-sampled points using the nearest prototype rule.
The clusiVAT framework was applied to two real life smart city applications to understand the underlying patterns hidden in the huge volumes of data to generate knowledge. The first application used clusiVAT for clustering and anomaly detection from the pedestrian and vehicle trajectories obtained from a video surveillance system. Experiments were performed on a real-life MIT trajectories dataset of vehicles and pedestrians from a parking lot scene. The trajectory clusters and anomalies thus obtained were helpful in the high-level interpretation of a scene (crowd behavior modeling), as feedback for a low-level (individual) tracking and activity prediction system and as an alarm for human supervisor.
For the second application, clusiVAT was used to cluster large scale (of the order of millions) vehicular trajectories obtained from the GPS traces of taxis in the city of Beijing and Singapore using a novel Dijkstra-based dynamic time warping distance measure. The results facilitated the understanding of spatial and temporal patterns in trajectories and were of great significance for decision-makers to understand road traffic conditions and to propose metro bus corridors and light rail systems for better public transport.
Another prominent data generated by smart city IoT infrastructure are high-velocity data streams. Automatic interpretation of these evolving big data is required for timely detection of unusual events. This thesis presents a computationally efficient `hot' update approach for incremental visualization of evolving cluster structure in streaming data. The new algorithms were demonstrated for two applications: online anomaly detection and sliding window based clustering of time series data. Numerical experiments on weather monitoring data from great barrier reef and the city of Melbourne provided visual clues to the onset of the new structure in streaming data.

26 APR 2010

Soft Fusion Methods for Multimodal Speech
Applications

Indian Institute of Technology Kanpur

In this thesis, decisions made from audio and video information separately are late fused using Dempster Schafer (DS) theory, which provides a soft belief function for fusing information from independent modalities.

Thesis D. Kumar

Soft Fusion Methods for Multimodal Speech Applications

D. Kumar Thesis

The complementary nature of audio and video information is well established. Video information about mouth shape and position can be used to interpret audio information in a better way. In this thesis, decisions made from audio and video information separately are late fused using Dempster Schafer (DS) theory, which provides a soft belief function for fusing information from independent modalities. Speaker diarization is the problem of finding out speaking times of each speaker and grouping together homogeneous segments. This is an increasingly relevant problem in meeting room scenarios and for automatic meeting documentation. In this thesis, speaker diarization using audio only information is performed using Bayesian Information Criteria (BIC) and video based diarization is performed using Hidden Markov Model (HMM) modelling of speaking and non speaking segments and later the two decisions are fused using DS theory. Speech recognition is the problem of finding out what is being said by listening to or by seeing someone speak or both. Applications of speech recognition are numerous and include better human-computer interface and speech controlled applications. In this thesis, speech recognition is performed using HMM modelling of audio and video features, and later the decisions made using these modalities are fused using DS theory.

10 DEC 2018

Social-Media aided Hyperlocal Help-Network Matching
& Routing during Emergencies

IEEE International Conference on Big Data (Big Data)

PP. 1606-1611, 2018.

This article propose and design a social-media (specifically Twitter) aided hyperlocal help-network by utilizing the tweets to identify users who require help and those who are willing to provide it.

Conferences D. Kumar, T. Yabe, and S. Ukkusuri.

Social-Media aided Hyperlocal Help-Network Matching & Routing during Emergencies

D. Kumar, T. Yabe, and S. Ukkusuri. Conferences

Catering to the humanitarian needs of hurricane-affected residents is the most challenging part for the emergency management agencies. These agencies typically follow a centralized help disbursement model by collecting donations and disbursing them to the needful through their employees or registered volunteers. The time required to move goods and volunteers to the place of need poses a survival challenge to emergency hit residents especially during the initial few days after the emergency. We propose and design a social-media (specifically Twitter) aided hyperlocal help-network by utilizing the tweets to identify users who require help and those who are willing to provide it. We also analyze tweets related to road damage, traffic jam, etc. to sense the current state of road infrastructure. We propose to match the help seekers and those who are willing to help, taking into consideration their spatial proximity and then provide the fastest working route for the help-provider to reach the matched help-seeker. Numerical experiments performed on hurricane Sandy Twitter dataset shows the effectiveness of the proposed approach as we are able to satisfy the need of more than 80% of help-seekers by matching them to appropriate help-offerer within a 24-hour duration after posting the request for help tweet with a maximum travel distance of 10 km.

20 AUG 2018

Approximate Cluster Heat Maps of Large High-
Dimensional Data

International Conference on Pattern Recognition (ICPR)

PP. 195-200, 2018.

In this article, we introduce a modification of siVAT called siVAT+ which approximates cluster heat maps for large volumes of high dimensional data much more rapidly than siVAT.

Conferences P. Rathore, J. C. Bezdek, D. Kumar,
S. Rajasegarar, and M. Palaniswami.

Approximate Cluster Heat Maps of Large High-Dimensional Data

P. Rathore, J. C. Bezdek, D. Kumar, S. Rajasegarar, and M. Palaniswami. Conferences

The problem of determining whether clusters are present in numerical data (tendency assessment) is an important first step of cluster analysis. One tool for cluster tendency assessment is the visual assessment of tendency (VAT) algorithm. VAT and improved VAT (iVAT) produce an image that provides visual evidence about the number of clusters to seek in the original dataset. These methods have been successful in determining potential cluster structure in various datasets, but they can be computationally expensive for datasets with a very large number of samples. A scalable version of iVAT called siVAT approximates iVAT images, but siVAT can be computationally expensive for big datasets. In this article, we introduce a modification of siVAT called siVAT+ which approximates cluster heat maps for large volumes of high dimensional data much more rapidly than siVAT. We compare siVAT+ with siVAT on six large, high dimensional datasets. Experimental results confirm that siVAT+ obtains images similar to siVAT images in a few seconds, and is 8 - 55 times faster than siVAT.

23 APR 2018

Utilizing Geo-tagged Tweets to Understand Evacuation Dynamics during Emergencies: A case study of
Hurricane Sandy

The Web Conference (WWW) Companion

PP. 1613-1620, 2018.

This paper leverages the geo-tagged Tweets posted in the New York City (NYC) in wake of Hurricane Sandy to understand the evacuation behavior of the residents.

Conferences D. Kumar and S. Ukkusuri.

Utilizing Geo-tagged Tweets to Understand Evacuation Dynamics during Emergencies: A case study of Hurricane Sandy

D. Kumar and S. Ukkusuri. Conferences

Hurricane evacuation is a complex process and a better understanding of the evacuation behavior of the coastal residents could be helpful in planning better evacuation policy. Traditionally, various aspects of the household evacuation decisions have been determined by post-evacuation questionnaire surveys, which are usually time-consuming and expensive. Increased activity of users on social media, especially during emergencies, along with the geo-tagging of the posts, provides an opportunity to gain insights into user's decision-making process, as well as to gauge public opinion and activities using the social media data as a supplement to the traditional survey data. This paper leverages the geo-tagged Tweets posted in the New York City (NYC) in wake of Hurricane Sandy to understand the evacuation behavior of the residents. Based on the geo-tagged Tweet locations, we classify the NYC Twitter users into one of the three categories: outside evacuation zone, evacuees, and non-evacuees and examine the types of Tweets posted by each group during different phases of the hurricane. We establish a strong link between the social connectivity with the decision of the users to evacuate or stay. We analyze the geo-tagged Tweets to understand evacuation and return time and evacuation location patterns of evacuees. The analysis presented in this paper could be useful for authorities to plan a better evacuation campaign to minimize the risk to the life of the residents of the emergency hit areas.

03 JUN 2018

Identifying Singleton Spammers via Spammer Group Detection

Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)

PP. 656-667, 2018.

This paper proposes to infer the hidden reviewer-product associations by review-product matrix completion to detect singleton spammers.

Conferences D. Kumar, Y. Shaalan, X. Zhang, and J. Chan.

Identifying Singleton Spammers via Spammer Group Detection

D. Kumar, Y. Shaalan, X. Zhang, and J. Chan. Conferences

Opinion spam is a well-recognized threat to the credibility of online reviews. Existing approaches to detecting spam reviews or spammers examine review content, reviewer behavior and reviewer-product network, and often operate on the assumption that spammers write at least several if not many fake reviews. On the other hand, spammers setup multiple sockpuppet IDs and write one-time, singleton spam reviews to avoid detection. It is reported that for most review sites, a large portion, sometimes over 90%, of reviewers are singletons (identified by the reviewer ID). Singleton spammers are difficult to catch due to the scarcity of behavioral clues. In this paper, we argue that the key to detect singleton spammers (and their fake reviews) is to detect group spam attacks by inferring the hidden collusiveness among them. To address the challenge of lack of explicit behavioral signals for singleton reviewers, we propose to infer the hidden reviewer-product associations by completing the review-product matrix by leveraging the product and review metadata and text. Experiments on three real-life Yelp datasets established that our approach can effectively detect singleton spammers via group detection, which are often missed by existing approaches.

05 FEB 2018

Bayesian maximum entropy and interacting multiple model based automatic sensor drift detection and correction in an IoT environment

IEEE World Forum on Internet of Things (WF-IoT)

PP. 598-603, 2018.

In this paper, we present a new methodology to automatically detect and correct both the smooth and steep drifts by employing Bayesian Maximum Entropy and Interacting Multiple Model based techniques.

Conferences P. Rathore, D. Kumar, S. Rajasegarar, and M. Palaniswami.

Bayesian maximum entropy and interacting multiple model based automatic sensor drift detection and correction in an IoT environment

P. Rathore, D. Kumar, S. Rajasegarar, and M. Palaniswami. Conferences

With the advancement in the Internet of Things (IoT) technologies, a variety of sensors including inexpensive, low-precision sensors with sufficient computing and communication capabilities are increasingly deployed for monitoring large geographical areas. One of the problems with the use of inexpensive sensors is the drift that they develop over time. These drifting sensors need to be calibrated automatically for continuous and reliable monitoring. In this paper, we present a new methodology to automatically detect and correct both the smooth and steep drifts by employing Bayesian Maximum Entropy and Interacting Multiple Model based techniques. The evaluation on real IoT data gathered from an indoor and an outdoor deployment reveals the superiority and applicability of our method in correctly identifying and correcting the smooth and abrupt (sensor) drifts in the IoT environment.

11 DEC 2017

Exploring the dynamics of surge pricing in mobility-on-demand taxi services

IEEE International Conference on Big Data (Big Data)

PP. 1375-1380, 2017.

In this paper, we collect and mine the operational data of one of the largest mobility service provider: Uber in the New York City (NYC) to understand the underlying mechanism behind the dynamic pricing generation.

Conferences W. Zhang, D. Kumar, and S. V. Ukkusuri.

Exploring the dynamics of surge pricing in mobility-on-demand taxi services

W. Zhang, D. Kumar, and S. V. Ukkusuri. Conferences

Dynamic pricing implemented in the form of a surge price multiplier (SPM) by mobility-on-demand services such as Uber, Lyft, etc. have significantly altered the demand-supply dynamics of the fixed fare rate traditional taxi market. However, it bears a fair share of criticism for being opaque, opportunistic, and socially insensitive, especially during large public events and emergency situations. In this paper, we collect and mine the operational data of one of the largest mobility service provider: Uber in the New York City (NYC) to understand the underlying mechanism behind the dynamic pricing generation. We find the common spatiotemporal patterns in the SPM and identify the cost-effectiveness of its most popular service, UberX as compared to UberBlack and street hailing taxis. We model the underlying phenomenon behind the SPM generation as a function of demand, supply, the time of the day, the day of the week, and expected time to arrival (ETA) using various machine learning classifiers. Support vector machines, k-nearest neighbor, and decision tree classifiers are found to model the SPM the best with the average classification loss for the 10-fold cross validation being as low as 0.001 for the rapidly changing SPM for UberX.

13 June 2016

Understanding Urban Mobility via Taxi Trip Clustering

IEEE International Conference on Mobile Data Management
(MDM), 2016

PP. 318-324, 2016.

This paper clusters the origin-destination pairs of the passenger taxi rides to provide useful insight into the city mobility patterns and urban hot-spots.

Conferences D. Kumar, H. Wu, Y. Lu,
S. Krishnaswamy, and M. Palaniswami.

Understanding Urban Mobility via Taxi Trip Clustering

D. Kumar, H. Wu, Y. Lu, S. Krishnaswamy, and M. Palaniswami. Conferences

Clustering of a large amount of taxi GPS mobility data helps to understand the spatio-temporal dynamics for the applications of urban planning and transportation. In this paper we cluster the origin-destination pairs of the passenger taxi rides to provide useful insight into the city mobility patterns, urban hot-spots, road network usage and general patterns of the crowd movement within the city of Singapore. We perform experiments on a large scale Singapore taxi dataset consisting of more than 10 million passenger origin-destination GPS points. We use the clusi VAT sampling scheme to obtain the sample trips which return coarse clusters describing the major crowd movement and reduce the data points that are not captured by the coarse clusters and may bring in noises during fine-grained clustering. After the sampling step we use the well known density based clustering algorithm DBSCAN to find cluster structure in the sampled data points and later extend it to the rest of the dataset using nearest prototype rule. We report 24 trip clusters from the dataset which are compact enough to draw meaningful conclusions about the city mobility patterns and the number of trips in each cluster is large enough to be representative of the general traffic movement.

10 AUG 2015

A Scalable Framework for Clustering Vehicle
Trajectories in a Dense Road Network

International Workshop on Urban Computing (UrbComp), Held in conjunction with the 21th ACM SIGKDD 2015

This paper proposes a novel Dijkstra based Dynamic Time Warping (DTW) distance measure, trajDTW for road-network constrained trajectories.

Conferences D. Kumar, S. Rajasegarar, M. Palaniswami,
X. Wang, and C. Leckie.

A Scalable Framework for Clustering Vehicle Trajectories in a Dense Road Network

D. Kumar, S. Rajasegarar, M. Palaniswami, X. Wang, and C. Leckie. Conferences

Cluster analysis is a fundamental challenge in trajectory mining. However, existing trajectory clustering algorithms are not well suited to large numbers of trajectories in a city road network because of inadequate distance measures between two trajectories. In this paper we propose a novel Dijkstra based Dynamic Time Warping (DTW) distance measure, trajDTW between two trajectories, which is suitable for large numbers of overlapping trajectories in a dense road network. We show the superiority of trajDTW over previously proposed distance measures Dissimilarity with Length (DSL) and Hausdorff distance for point sets using a few sample trajectories on a road network. We then show how our sampling based clustering algorithm clusiVAT can suggest the number of clusters, and identify and visualize the trajectories belonging to each cluster. We also detect anomalous trajectories in a given dataset using clusiVAT. Experimental results on a large scale T-Drive taxi trajectory dataset consisting of 43,405 trajectories on a road network having 100 nodes and 141 edges reveals the presence of 12 clusters having an average of 2,029 trajectories each. We compare the trajectory clusters obtained using the clusiVAT algorithm employing trajDTW distance measure with those obtained using the NETSCAN trajectory clustering method proposed in the literature. Furthermore, we identify the top 100 anomalies corresponding to a few vehicles taking unusually warped paths for their commute. These anomalous trajectories have their maximum traffic density in geographically distinct sections of the road network.

9 OCT 2013

clusiVAT: A Mixed Visual/Numerical Clustering
Algorithm For Big Data

IEEE International Conference on Big Data (BigData)

PP. 112-117, 2013.

This paper compares single linkage clustering based on MSTs built with the Filter-Kruskal method to the proposed clusiVAT algorithm.

Conferences D. Kumar, M. Palaniswami, S. Rajasegarar, C. Leckie,
J. Bezdek, and T. Havens.

clusiVAT: A Mixed Visual/Numerical Clustering Algorithm For Big Data

D. Kumar, M. Palaniswami, S. Rajasegarar, C. Leckie, J.C. Bezdek, and T.C. Havens. Conferences

Recent algorithmic and computational improvements have reduced the time it takes to build a minimal spanning tree (MST) for big data sets. In this paper we compare single linkage clustering based on MSTs built with the Filter-Kruskal method to the proposed clusiVAT algorithm, which is based on sampling the data, imaging the sample to estimate the number of clusters, followed by non-iterative extension of the labels to the rest of the big data with the nearest prototype rule. Numerical experiments with both synthetic and real data confirm the theory that clusiVAT produces true single linkage clusters in compact, separated data. We also show that single linkage fails, while clusiVAT finds high quality partitions that match ground truth labels very well. And clusiVAT is fast: it recovers the preferred c = 3 Gaussian clusters in a mixture of 1 million two-dimensional data points with 100% accuracy in 3.1 seconds.

07 JUL 2013

Motor Recovery Monitoring In Post-Acute Stroke
Patients Using Wireless Accelerometer And Cross Correlation

IEEE International Conference of the Engineering in Medicine and Biology Society (EMBC)

PP. 6703–6707, 2013.

This paper proposes a novel technique based on cross-correlation of accelerometer values along different axes for predicting the NIHSS index.

Conferences D. Kumar, J. Gubbi, B. Yan, and M. Palaniswami.

Motor Recovery Monitoring In Post-Acute Stroke Patients Using Wireless Accelerometer And Cross Correlation

D. Kumar, J. Gubbi, B. Yan, and M. Palaniswami. Conferences

Stroke is a major reason for physical immobility and death. For effective treatment of stroke, early diagnosis and aggressive medication in the form of thrombolytic drugs is shown to be essential. In order to provide proper care, the patient should be kept under continuous monitoring during the first few hours after subjecting thrombolytic drugs and based on the response of the patient to the medication, line of treatment should be changed. In our previous work, we have shown the proof of principle by monitoring the motor activity of the stroke patient using accelerometer fitted on patient's arms. Based on preliminary analysis, we proposed methods using resultant acceleration signal and showed its effectiveness in predicting National Institute of Health Stroke Scale (NIHSS) stroke index. In this paper, novel technique based on cross-correlation of accelerometer values along different axes is developed for predicting the NIHSS index. An overall increase in prediction accuracy by over 7% compared to the earlier method is obtained. A multi-class support vector machine (SVM) classifier for cross correlation features is also designed and an overall prediction accuracy of 93% is achieved.

07 JUL 2013

A Pilot Study On The Use Of Accelerometer Sensors
For Monitoring Post Acute Stroke Patients

IEEE International Conference of the Engineering in Medicine and Biology Society (EMBC)

PP. 957–960, 2013.

This paper presents a pilot study to analyse and detect the affected arm of the stroke patient based on hand movements.

Conferences J. Gubbi, D. Kumar, A. Rao, B. Yan, and M. Palaniswami.

A Pilot Study On The Use Of Accelerometer Sensors For Monitoring Post Acute Stroke Patients

J. Gubbi, D. Kumar, A. Rao, B. Yan, and M. Palaniswami. Conferences

The high incidence of stroke has raised a major concern among health professionals in recent years. Concerted efforts from medical and engineering communities are being exercised to tackle the problem at its early stage. In this direction, a pilot study to analyse and detect the affected arm of the stroke patient based on hand movements is presented. The premise is that the correlation of magnitude of the activities of the two arms vary significantly for stroke patients from controls. Further, the cross-correlation of right and left arms for three axes are differentiable for patients and controls. A total of 22 subjects (15 patients and 7 controls) were included in this study. An overall accuracy of 95.45% was obtained with sensitivity of 1 and specificity of 0.86 using correlation based method.

23 May 2013

Automatic Sensor Drift Detection And Correction
Using Spatial Kriging And Kalman Filtering

IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS)

PP. 183–190, 2013.

This paper proposes a framework to automatically detect and correct the drift of the sensor nodes to keep the WSN usable.

Conferences D. Kumar, S. Rajasegarar, and M. Palaniswami.

Automatic Sensor Drift Detection And Correction Using Spatial Kriging And Kalman Filtering

D. Kumar, S. Rajasegarar, and M. Palaniswami. Conferences

Internet-of-Things (IoT) is a concept referring to interconnected people and objects and smart city is one of the many applications of IoT. Wireless Sensor Network (WSN) is a specific technology that helps to create "Smart Cities". It aims at creating a distributed network of intelligent sensor nodes which can measure various parameters for efficient management of the city. The data thus collected through a range of sensors is processed and is delivered wirelessly in real-time to the citizens or the appropriate authorities. Since the application framework for smart city application is huge, it would require a large number of different types of sensors for its implementation and the project could be viable only if we use low resolution, low precision but inexpensive sensors. The sensors in sensor network can suffer from random or systematic errors. Most common problem with inexpensive sensors used in WSNs for smart city applications is of drift and bias. They can be calibrated at the time of deployment, but they develop drift, which is the slow change in the reading of sensor from actual value as time progresses. In this paper we have proposed a framework to automatically detect and correct the drift of the sensor nodes to keep the WSN usable. Kriging based interpolation of the sensor readings of neighboring sensors is used to predict actual value at the sensor node and the measured drift is then kalman filtered to get correct drift estimates. We have demonstrated the results of this algorithm on real sensor data obtained from Intel Research Berkeley Laboratory deployment and shown that our system is able to detect and correct smooth drift and bias generated in the sensors. We have also shown that our system is robust with respect to the number of sensor nodes drifting and significantly outperforms the traditional averaging based interpolation methods.

17 Dec 2009

Multi modal speaker diarization using a soft
belief function

International Conference on Natural Language Processing (ICON)

PP. 376-381, 2009.

This paper describes a methodology of fusing information from multiple modalities like speech and video for the purpose of speaker diarizaton.

Conferences D. Kumar, R. Malhotra, A. Singh, and R. Hegde.

Multi modal speaker diarization using a soft
belief function

D. Kumar, R. Malhotra, A. Singh, and R. Hegde. Conferences

In this paper we describe a methodology of fusing information from multiple modalities like speech and video for the purpose of speaker diarizaton. Conventionally unimodal information from individual modalities like speech and video along with appropriate modeling techniques like the Bayesian Information Criterion and Hidden Markov Models respectively are used to detect speaker changes and subsequent speaker diarization. The independent and complementary information present in speech and video modalities is utilized to propose a fusion methodology herein. The proposed method formulates a soft belief function based on Dempster- Shafer theory which tests the hypothesis that a speaker change is detected. This belief function is used to group together homogeneous acoustic segments to perform speaker diarization. This method is applied to speaker diarization task on the AMI database and also on the data collected in a laboratory meeting room test bed. Reasonable reduction in the speaker diarization error rates are reported on both data sets. Separability analysis results are also presented to reinforce the complementary nature of the information in the audio visual modalities.

.04

RESEARCH

  • RESEARCH PROJECTS
  • 2013
    Present

    NOVEL ALGORITHMS FOR ASSESSING CLUSTERING TENDENCY OF BIG DATA

    THE UNIVERSITY OF MELBOURNE

    This project develops several members of the VAT family of Algorithms to tackle different aspects of big data analytics to understand a variety of novel data sources and extract actionable knowledge from them. This research can be broadly classified into three categories which cater to the high volume, high velocity (streaming data) and high dimensionality aspects of big data.

    Related Publications:

    1. P. Rathore, D. Kumar, S. Rajasegarar, M. S. Palaniswami, and J. C. Bezdek "Visual Structural Assessment and Anomaly Detection for High-Velocity Data Streams," in IEEE Transactions on Cybernetics (T-CYB).
    2. D. Kumar and J. C. Bezdek "Visual approaches for exploratory data analysis: A survey of the VAT family of algorithms," in IEEE Systems, Man, and Cybernetics Magazine, vol. 6, no. 2, pp. 10-48, April 2020.
    3. M. Palaniswami, A. S. Rao, D. Kumar, P. Rathore, and S. Rajasegarar, "Role of Visual Assessment of Clusters for Big Data Analysis from Real-world Internet of Things," in IEEE Systems, Man, and Cybernetics Magazine (SMC-MAG), accepted.
    4. P. Rathore, D. Kumar, J. C. Bezdek, S. Rajasegarar and M. S. Palaniswami, "A Rapid Hybrid Clustering Algorithm for Large Volumes of High Dimensional Data," in IEEE Transactions on Knowledge & Data Engineering (TKDE), vol. 31, no. 4, pp. 641-654, Apr. 2019.
    5. D. Kumar, Z. Ghafoori, J. C. Bezdek, C. Leckie, K. Ramamohanarao, and M., Palaniswami, " Dealing with Inliers in Feature Vector Data," in International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), vol. 26, no. 2, pp. 25-45, 2018.
    6. D. Kumar, J. Bezdek, S. Rajasegarar, M. Palaniswami, C. Leckie, J. Chan, and J. Gubbi, “Adaptive Cluster Tendency Visualization and Anomaly Detection for Streaming Data.” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 11, no. 2, pp. 24:1-24:40, Dec 2016.
    7. D. Kumar, J. Bezdek, M. Palaniswami, S. Rajasegarar, C. Leckie, and T. Havens, “A Hybrid Approach to Clustering in Big Data.” IEEE Transactions on Cybernetics, vol. 46, no. 10, pp. 2372-2385, Oct. 2016.
    8. P. Rathore, J. Bezdek, D. Kumar, S. Rajasegarar, and M. Palaniswami, “Approximate Cluster Heat Maps of Large High-Dimensional Data.” International Conference on Pattern Recognition (ICPR), pp. 195-200, 2018.
    9. D. Kumar, J. Bezdek, S. Rajasegarar, M. Palaniswami, T. Havens, and C. Leckie, “clusiVAT: A mixed visual/numerical clustering algorithm for big data,” IEEE International Conference on Big Data (BigData), pp. 112-117, 2013.
  • 2015
    Present

    UNDERSTANDING AND EXTRACTING ACTIONABE KNOWLEDGE FROM SMART CITY GENERATED BIG DATA

    THE UNIVERSITY OF MELBOURNE

    In this project, the in-house developed clustering tendency assessment algorithms are used to understand and extract actionable knowledge from smart city generated big data. The variety of datasets experimented upon include pedestrian and vehicle trajectories obtained from a video surveillance system, large scale vehicular trajectories obtained from the GPS traces of taxis in the city of Beijing and Singapore, weather monitoring data from Great Barrier Reef and the city of Melbourne, and energy usage data (of connected devices) and the context of the users from an indoor office environment.

    Related Publications:

    1. P. Rathore, D. Kumar, S. Rajasegarar, M. S. Palaniswami, and J. C. Bezdek " A Scalable Framework for Trajectory Prediction," in IEEE Transactions on Intelligent Transportation Systems (T-ITS), vol. 20, no. 10, pp. 3860-3874, Oct. 2019.
    2. D. Kumar, H. Wu, S. Rajasegarar, C. Leckie, S. Krishnaswamy and M. Palaniswami, "Fast and Scalable Big Data Trajectory Clustering for Understanding Urban Mobility," in IEEE Transactions on Intelligent Transportation Systems (T-ITS), vol. 19, no. 11, pp. 3709-3722, Nov. 2018.
    3. D. Kumar, J. Bezdek, S. Rajasegarar, C. Leckie, and M. Palaniswami, “A Visual-Numeric Approach to Clustering and Anomaly Detection for Trajectory Data.” The Visual Computer - Springer, vol. 33, no. 3, pp. 265-281, 2017.
    4. S. Mahallati, J.C. Bezdek, D. Kumar, M.R. Popovic, and T.A. Valiante, “Interpreting Cluster Structure in Waveform Data with Visual Assessment and Dunn's Index.” Frontiers in Computational Intelligence - Springer, pp. 73–101, 2017.
    5. D. Kumar, H. Wu, Y. Lu, S. Krishnaswami, and M. Palaniswami, “Understanding Urban Mobility via Taxi Trip Clustering,” IEEE International conference on Mobile Data Management (MDM), pp. 318-324, 2016.
    6. D. Kumar, S. Rajasegarar, M. Palaniswami, X. Wang, and C. Leckie, “A Scalable Framework for Clustering Vehicle Trajectories in a Dense Road Network,” International Workshop on Urban Computing (UrbComp), in conjunction with the ACM SIGKDD 2015.
  • 2013
    2017

    SENSOR DRIFT DETECTION AND CORRECTION TO ENHANCE VERACITY OF SMART CITY DATA

    THE UNIVERSITY OF MELBOURNE

    This project proposed and experimented on several spatial interpolation techniques, e.g., kriging and Bayesian Maximum Entropy (BME) to automatically detect and correct the drift of a large number of inexpensive, error prone sensor nodes used for "Smart City" implementation to enhance data reliability.

    Related Publications:

    1. P. Rathore, D. Kumar, S. Rajasegarar, and M. Palaniswami, “Maximum Entropy based Auto Drift Correction using High and Low Precision Sensors.” ACM Transactions on Sensor Networks (TOSN), vol. 13, no. 3, pp. 24:1-24:41, Apr. 2017.
    2. D. Kumar, S. Rajasegarar, and M. Palaniswami, “Geospatial estimation based auto drift correction in wireless sensor networks,” ACM Transactions on Sensor Networks (TOSN), vol. 11, no. 3, pp. 50:1–50:39, Apr. 2015.
    3. P. Rathore, D. Kumar, S. Rajasegarar, and M. Palaniswami, “Bayesian Maximum Entropy and Interacting Multiple Model Based Automatic Sensor Drift Detection and Correction in an IoT Environment.” IEEE World Forum on Internet of Things (WF-IoT), pp. 598-603, 2018.
    4. D. Kumar, S. Rajasegarar, and M. Palaniswami, “Automatic sensor drift detection and correction using spatial kriging and kalman filtering,” IEEE International Conference on Distributed Computing in Sensor Systems (DCoSS), pp. 183–190, 2013.
  • 2017
    2019

    LEVERAGING SOCIAL-MEDIA DATA FOR EMERGENCY MANAGEMENT

    PURDUE UNIVERSITY

    This project leverages the geo-tagged Tweets posted by the hurricane hit residents prior to, during and after the tragedy to understand their evacuation behavior. An efficient approach toward help disbursement during emergencies: "Hyperlocal help-networks" is also proposed toward help disbursement during emergencies.

    Related Publications:

    1. D. Kumar and S. Ukkusuri, “Enhancing demographic coverage of hurricane evacuation behavior modeling using social media.” Journal of Computational Science, vol. 45, 2020.
    2. D. Kumar, T. Yabe, and S. Ukkusuri, “Social-Media aided Hyperlocal Help-Network Matching & Routing during Emergencies.” IEEE International Conference on Big Data (BigData), pp. 1606-1611, 2018.
    3. D. Kumar and S. Ukkusuri, “Utilizing Geo-tagged Tweets to understand Evacuation Dynamics during Emergencies: A case study of Hurricane Sandy.” The Web Conference (WWW) Companion, pp. 1613-1620, 2018.
  • 2017
    2019

    UNDERSTANDING MOBILITY SERVICE PROVIDERS: CASE STUDY - UBER

    PURDUE UNIVERSITY

    This project collects and mine the trajectory data of online drivers who serve Uber (one of the largest mobility service provider) to demystify how Uber drives their drivers. The analysis include market share, trip metrics, and the spatial distributions of trip origins and destinations of different Uber services and use of surge price as an instrument to restore driver-rider balance.

    Related Publications:

    1. X. Qian, D. Kumar, W. Zhang, and S. V. Ukkusuri, "Understanding the operational dynamics of Mobility Service Providers: A case of Uber," in ACM Transactions on Spatial Algorithms and Systems (TSAS), vol. 6, no. 2, pp. 12:1-12:20, Feb. 2020.
    2. W. Zhang, D. Kumar, and S. V. Ukkusuri, “Exploring the Dynamics of Surge Pricing in Mobility-on-Demand Taxi Services,” IEEE International Conference on Big Data (BigData), pp. 1375-1380, 2017.
  • 2012
    2013

    MONITORING POST ACCUTE STROKE PATIENTS USING WRIST-WORN ACCELEROMETER SENSORS

    THE UNIVERSITY OF MELBOURNE

    This project analyzes the data generated from a hand wearable wireless system for automated stroke patient management. The records of hand activity of both hands using tri-axial accelerometer sensor are mined to estimate affected side and predict the NIHSS stroke index based on activity comparison.

    Related Publications:

    1. D. Kumar, J. Gubbi, B. Yan, and M. Palaniswami, “Motor recovery monitoring in post-acute stroke patients using wireless accelerometer and cross correlation,” lEEE International Conference of the EMBS (EMBC), pp. 6703–6707, 2013.
    2. J. Gubbi, D. Kumar, A. Rao, B. Yan, and M. Palaniswami, “A pilot study on the use of accelerometer sensors for monitoring postacute stroke patients,” lEEE International Conference of the EMBS (EMBC), pp. 957–960, 2013.
.05

TEACHING

  • TEACHING HISTORY
  • 2019
    Present

    Asst. Professor

    IIT ROORKEE

    I am teaching following courses at IIT Roorkee:
    1. ECN 316: Digital Image Processing.
    2. ECN 511: Linear Algebra and Random Processes.
  • 2017
    2019

    POST-DOCTORAL RESEARCH ASSISTANT

    PURDUE UNIVERSITY

    During my Post-doctoral experience at Purdue University, I taught a 3-lecture series on the topic of “Big data analytics in transportation” during Spring 2017 semester which included an introduction of big data analytics and machine learning techniques to students having limited background in data sciences. The course involved a small hands-on project involving analyzing real-life data from the aviation transportation domain. I also co-taught the course "CE 597: Data Science for Smart Cities" with my supervisor Prof. Satish Ukkusuri during fall 2018. This course aimed at introducing students to various data science concepts, methodologies and their implementation in Python with application to smart city problems.
  • 2011
    2012

    LECTURER

    The LNM INSTITUTE OF INFORMATION TECHNOLOGY

    While working as a Lecturer at The LNM Institute of Information Technology, Jaipur, I taught a course on Microprocessor and Interface using Intel's 8085 architecture and assembly programming. I set up ATMEL MCU University centre at LNMIIT having facilities including AVR micro controllers and necessary interfaces. I also set up the lab for Microprocessor (8085) from scratch and designed the experiments and carried them out successfully.
  • 2009
    2010

    TEACHING ASSISTANT

    INDIAN INSTITUTE OF TECHNOLOGY KANPUR

    I worked as a Teaching Assistant at Indian Institute of Technology (IIT) Kanpur for two semesters. In semester 1, I Conducted lab for Digital Electronics and Microprocessor Technology course, which includes digital electronics circuits and programming of 8085 micro controller. In second semester, I conducted lab for first year undergraduate course, Introduction to Electronics, which involves basic circuit designing.
.06

Contact

  • CONTACT DETAILS
  • E-mail :

    dheeraj.kumar@ece.iitr.ac.in

    Alternate E-mail :

    genuine.dheeraj@gmail.com
  • OFFICE ADDRESS
  • Dr. Dheeraj Kumar
    Department of Electronics and Communications Engineering,
    IIT Roorkee,
    Roorkee, Uttarakhand - 247667
    India.