Knowledge Discovery for Activity Monitoring

We study how knowledge discovery, data mining, and machine learning technologies can be used to find patterns in data for purposes of monitoring activity. While this project focuses on intelligence applications (e.g., business intelligence, competitive intelligence, government intelligence), the technologies apply much more broadly: to fraud detection, stock monitoring, customer relationship management, and so on. In particular, activity monitoring often involves complicated and not-completely understood relationships between entities, and the entities and relationships themselves can be described to varying degrees of specificity. In this work, we concentrate on the problem of learning patterns from related entities in a time-varying environment, for monitoring activity to alert users of important events.

Typical, flat-file, accuracy-based pattern learning is ill suited to finding important patterns in these domains. Pattern learning must be able to capitalize on the relationships between entities, and on the attributes of related entities, and on changes over time, both in the data streams and to the web of related entities. Furthermore, the problems share other characteristics that render them problematic for traditional pattern learning. The volume of data is huge, but the number of interesting training data (positive examples) may be small. Traditional algorithms have problems in such situations. Being able to analyze explicitly the tradeoff between false alarms and misses is crucial. Unlike many seemingly similar applications such as document classification, but similar to situations like fraud detection, it may be important to have a very low miss rate, even if that means analysts have to deal with large numbers of false alarms. In applications such as these, producing effective rankings of cases can be more effective than straight classification.

Finally, due (especially) to the small number of positive training examples, it is essential to involve human experts in the process. Experts can inject background knowledge in several ways, and different ways require pattern-learning algorithms to be able to accept background knowledge to different degrees. For example, active learning techniques allow experts to label particularly useful data points, without having to understand the internal workings of the learned model. On the other hand, having comprehensible models can facilitate the inclusion of domain experts, for interactive learning.

Current Project Participants

Prof. Foster Provost, New York University, Stern School of Business
Prof. Abraham Bernstein, University of Zurich
Shawndra Hill, New York University, Stern School of Business
Claudia Perlich, New York University, Stern School of Business

Relevant Publications

Activity Monitoring
Probability estimation and ranking for classification tasks
The knowledge discovery process
Learning with relational knowledge

Activity Monitoring

Macskassy, S., H. Hirsh, F. Provost, R. Sankaranarayanan, V. Dhar. “Intelligent Information Triage.” In Proceedings of SIGIR-2001.
T. Fawcett and F. Provost, "Activity Monitoring: Noticing Interesting Changes in Behavior." Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-99).
Fawcett, T. and F. Provost, "Adaptive Fraud Detection." Data Mining and Knowledge Discovery 1 (1997).
Fawcett, T. and F. Provost, "Combining Data Mining and Machine Learning for Effective User Profiling." In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96).

Probability estimation and ranking for classification tasks

Saar-Tsechansky, M. and F. Provost, "Active Learning for Class Probability Estimation and Ranking." In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01).
Provost, F., and P. Domingos. "Well-trained PETs: Improving Probability Estimation Trees" CeDER Working Paper #IS-00-04, Stern School of Business, New York University, NY, NY 10012 (PDF) www.cs.rutgers.edu/~gweiss/papers/ml-tr-44.pdf
Weiss, G. and F. Provost. "The Effect of Class Distribution on Classifier Learning: An Empirical Study" Technical Report ML-TR-44, Department of Computer Science, Rutgers University, January 2001. (PDF)
Provost, F. and T. Fawcett, "Robust Classification for Imprecise Environments." Machine Learning 42, 203-231, 2001. (PDF)
Provost, F. and T. Fawcett, "Robust Classification Systems for Imprecise Environments." In Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98).
Provost, F., T. Fawcett, and R. Kohavi "The Case Against Accuracy Estimation for Comparing Classifiers." In Proceedings of the Fifteenth International Conference on Machine Learning (ICML-98).
Provost, F. and T. Fawcett, "Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions." In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97).

The knowledge discovery process

Bernstein, A. and F. Provost. "An Intelligent Assistant for the Knowledge Discovery Process." CeDER Working Paper #IS-01-01, Stern School of Business, New York University, January 2001.

Learning with relational knowledge

Aronis, J. and F. Provost, "Increasing the Efficiency of Inductive Learning with Breadth-first Marker Propagation." In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97).
Aronis, J., V. Kolluri, F. Provost, and B. Buchanan, "The WoRLD: Knowledge Discovery from Multiple Distributed Databases." In Proc. of the Florida Artificial Intelligence Research Symposium (FLAIRS-97).
Aronis, J., F. Provost, and B. Buchanan, "Exploiting Background Knowledge in Automated Discovery." In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96).
Aronis, J. and F. Provost, "Efficiently Constructing Relational Features from Background Knowledge for Inductive Machine Learning" In Proceedings of the AAAI-94 Workshop on Knowledge Discovery in Databases, (KDD-94).