Department of Informatics
University of Zurich
Office: BIN 2.D.09
Phone: [+41] 44 635 67 26
Email: serban [at] ifi.uzh.ch
I am currently working on the e-Lico project.
Fall Semester 2010
Fall Semester 2009 - Distributed Systems
, An Overview of Intelligent Data Assistants for Data Analysis, Proc. of the 3rd Planning to Learn Workshop (WS9) at ECAI'10, August 2010. (inproceedings)
Todays intelligent data assistants (IDA) for data analysis are focusing on how to do effective and intelligent data analysis. However this is not a trivial task since one must take into consideration all the influencing factors: on one hand data analysis in general and on the other hand the communication and interaction with data analysts. The basic approach of building an IDA, where data analysis is (1) better as well as (2) faster in the same time, is not a very rewarding criteria and does not help in designing good IDAs. Therefore this paper tries to (a) discover constructive criteria that allow us to compare existing systems and help design better IDAs and (b) review all previous IDAs based on these criteria to find out what are the problems that IDAs should solve as well as which method works best for which problem. In conclusion we try to learn from previous experiences what features should be incorporated into a new IDA that would solve the problems of todays analysts.
, Auto-experimentation of KDD Workflows based on Ontological Planning, The 9th International Semantic Web Conference (ISWC 2010), Doctoral Consortium, November 2010. (inproceedings)
One of the problems of Knowledge Discovery in Databases (KDD) is the lack of user support for solving KDD problems. Current Data Mining (DM) systems enable the user to manually design workflows but this becomes diffcult when there are too many operators to choose from or the workflow's size is too large. Therefore we propose to use auto-experimentation based on ontological planning to provide the users with automatic generated workflows as well as rankings for workflows based on several criteria (execution time, accuracy, etc.). Moreover auto-experimentation will help to validate the generated workflows and to prune and reduce their number. Furthermore we will use mixed-initiative planning to allow the users to set parameters and criteria to limit the planning search space as well as to guide the planner towards better workflows.
, Data Mining Workflow Templates for Intelligent Discovery Assistance in RapidMiner, Proc of RCOMM'10 2010. (inproceedings)
Knowledge Discovery in Databases (KDD) has evolved during the last years and reached a mature stage offering plenty of operators to solve complex tasks. User support for building work?ows, in contrast, has not increased proportionally. The large number of operators available in current KDD systems make it difficult for users to successfully analyze data. Moreover, work?ows easily contain a large number of operators and parts of the work?ows are applied several times, thus it is hard for users to build them manually. In addition, work?ows are not checked for correctness before execution. Hence, it frequently happens that the execution of the work?ow stops with an error after several hours runtime. In this paper we address these issues by introducing a knowledge-based representation of KDD work?ows as a basis for cooperative-interactive planning. Moreover, we discuss work?ow templates that can mix executable operators and tasks to be re?ned later into sub-work?ows. This new representation helps users to structure and handle work?ows, as it constrains the number of operators that need to be considered. We show that work?ows can be grouped in templates enabling re-use and simplifying KDD work?ow construction in RapidMiner.
, Data Mining Workflow Templates for Intelligent Discovery Assistance and Auto-Experimentation, Proc of the ECML/PKDD'10 Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD'10), September 2010. (inproceedings)
Knowledge Discovery in Databases (KDD) has grown a lot during the last years. But providing user support for constructing workflows is still problematic. The large number of operators available in current KDD systems makes it difficult for a user to successfully solve her task. Also, workflows can easily reach a huge number of operators(hundreds) and parts of the workflows are applied several times. Therefore, it becomes hard for the user to construct them manually. In addition, workflows are not checked for correctness before execution. Hence, it frequently happens that the execution of the workflow stops with an error after several hours runtime. In this paper3 we present a solution to these problems. We introduce a knowledge-based representation of Data Mining (DM) workflows as a basis for cooperative interactive planning. Moreover, we discuss workflow templates, i.e. abstract workflows that can mix executable operators and tasks to be renewed later into sub-workflows. This new representation helps users to structure and handle workflows, as it constrains the number of operators that need to be considered. Finally, workflows can be grouped in templates which foster re-use further simplifying DM workflow construction.
, eProPlan: A Tool to Model Automatic Generation of Data Mining Workflows, Proc. of the 3rd Planning to Learn Workshop (WS9) at ECAI'10, August 2010. (inproceedings)
This paper introduces the first ontological modeling environment for planning Knowledge Discovery (KDD) workflows. We use ontological reasoning combined with AI planning techniques to automatically generate workflows for solving Data Mining (DM) problems. The KDD researchers can easily model not only their DM and preprocessing operators but also their DM tasks, that are used to guide the workflow generation.
, Towards Cooperative Planning of Data Mining Workflows, Proc of the ECML/PKDD09 Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-09) 2009. (inproceedings)
A ma jor challenge for third generation data mining and knowledge discovery systems is the integration of different data mining tools and services for data understanding, data integration, data preprocessing, data mining, evaluation and deployment, which are distributed across the network of computer systems. In this paper we outline how an intelligent assistant that is intended to support end-users in the difficult and time consuming task of designing KDD-Work?ows out of these distributed services can be built. The assistant should support the user in checking the correctness of work?ows, understanding the goals behind given work?ows, enumeration of AI planner generated work?ow completions, storage, retrieval, adaptation and repair of previous work?ows. It should also be an open easy extendable system. This is reached by basing the system on a data mining ontology (DMO) in which all the services (operators) together with their in-/output, pre-/postconditions are described. This description is compatible with OWL-S and new operators can be added importing their OWL-S speci?cation and classifying it into the operator ontology.