Paolo Garza Name Date of birth Citizenship Email Web page Paolo Garza March 1, 1976 Italian [email protected] http://dbdmg.polito.it/∼paolo Position and Education R ECORD OF E MPLOYMENT December 2012 – present Assistant Professor (non-tenure track) at the Department of Control and Computer Engineering of the Politecnico di Torino. June 2010 – December 2012 Assistant Professor (non-tenure track) at the Department of Electronics and Computer Science of the Politecnico di Milano. January 2010 – May 2010 Consultant at the Department of Control and Computer Engineering of the Politecnico di Torino working on “Open source data mining algorithms”. May 2008 – December 2009 Post-doc Fellow at the Department of Control and Computer Engineering of the Politecnico di Torino working on “Data mining algorithms”. January 2005 – April 2008 Research assistant at the Department of Control and Computer Engineering of the Politecnico di Torino working on “Classification of structured and textual data”. October 2003 – March 2004 Consultant at the Department of Control and Computer Engineering of the Politecnico di Torino working on “Design and development of the Layout Server component of the SAIB system”. January 2002 – December 2004 Ph.D. in Information and System Engineering at the Department of Control and Computer Engineering of the Politecnico di Torino working on “Data mining and database systems”. June 2001 – October 2001 Consultant at the Department of Control and Computer Engineering of the Politecnico di Torino working on “Integration of the X-Class prototype in the Data-X system”. September 2000 – April 2001 Consultant at Systel S.r.l. (Turin) working on “Design and development of web based applications”. July 1995 – December 1995 Consultant at Delta Tre Informatica S.r.l. (Turin) working on “Systems for data gathering and elaboration for international sport events”. E DUCATION • Ph.D. in Information and System Engineering at Politecnico di Torino. 2005. Title: Associative classification algorithms Advisor: Prof. E. Baralis • M.Sc. in Computer Science Engineering. May 2001. Grade: 107/110. (Thesis title: Clustering Algorithms for Structured Data, Advisor Prof. E. Baralis) S CHOLARSHIPS • Scholarship for “junior researcher” from Regione Piemonte, CRT Foundation, and Istituto Superiore Mario Boella (May 2008 - December 2009). • Ph.D. scholarship (January 2002 - December 2004). Research interests The main interests of my research are focused on new data mining algorithms and database systems. In particular, the activity carried out in the last years spans over the following research lines: Classification algorithms, Compact representation and querying of XML documents, and Outlier detection algorithms. C LASSIFICATION ALGORITHMS The research activity has been initially focused on innovative algorithms for performing classification by means of association rules. In particular, novel pruning techniques and classification algorithms based on compact representations for classification rules have been proposed and implemented [JR.3][JR.11][IC.12][IC.10][IC.9][IC.8][IB.4]. The proposed algorithms significantly outperform, in terms of accuracy, the previous classification algorithms and can be applied to both structured and textual data. In the last year, I focused my research on classification algorithms based on matrix factorization. I proposed a new matrix factorization classification algorithm that, despise its simplicity, is accurate and efficient [IC.4]. The main results of the proposed research are a set of associative classification algorithms for structured and textual data, a matrix factorization based classifier for structured data, and new compact representations for classifications rules. The results of the research have been published in [JR.3][JR.11][IC.12][IC.10][IC.9][IC.8][IC.4][IB.4] C OMPACT REPRESENTATION OF XML DOCUMENTS AND XML QUERY OPTIMIZATION The research activity has been devoted to summarized representations of XML documents by means of association rules. Summarized representation may be used for answering queries, either when very large XML documents have to be queried, or when the actual XML document is not available. Summarization allows overcoming the limits of traditional XQuery engines, which cannot currently process significantly large XML documents. The main results of the proposed research are a compact representation form for XML documents and its exploitation in XQuery engines. The results of the research have been published in [JR.4][IB.6][WS.4]. O UTLIER DETECTION ALGORITHMS The research activity has been devoted to innovative anomaly and outlier detection algorithms. The proposed algorithms, based on association rules and approximate functional dependencies, can be easily adapted to 2 different data models. In particular, the proposed algorithms can be exploited to detect outliers from relational, XML, and temporal data. The main results of the proposed research are new outlier detection algorithms and have been published in [JR.14][JR.15][IB.5][WS.3]. Professional Activities NATIONAL AND I NTERNATIONAL R ESEARCH P ROJECTS Paolo Garza contributed actively in the following research projects: • SAIB - A Framework for Internet Banking Applications, I NDUSTRIAL RESEARCH PROJECT GRANTED BY MIUR • Data-X: Management, Transformation and Exchange of Data in a Web Environment, MURST COFIN 1999 RESEARCH PROJECT • OpenKnowTech “Experimentation of Technologies for the Integration, Management and Distribution of Data, Processes and Knowledge”, RESEARCH PROJECT DM . 21301 - DECRETO MUR 2630/ RIC DEL 30/11/2006 • Context-aware system for user and service profiling, R ESEARCH CONTRACT WITH T ELECOM I TAL IA LAB • A system for “soft” real-time transaction management, R ESEARCH CONTRACT WITH T ELECOM I TAL IA LAB R EFEREE SERVICES Paolo Garza has been a reviewer for the following journals: • TKDD - ACM Transactions on Knowledge Discovery from Data, ACM • TDSC - IEEE Transactions on Dependable and Secure Computing, IEEE • TOIS - ACM Transactions on Information Systems (special issue on XML Retrieval), ACM • DKE - Data & Knowledge Engineering, Elsevier • KAIS - Knowledge and Information Systems, Springer • EAAI - Engineering Applications of Artificial Intelligence, Elsevier • IT Professional, IEEE • TLDKS - Transactions on Large Scale Data and Knowledge-Centered Systems, Springer • Intelligent Data Analysis, Special Issue on “Dynamic Networks and Knowledge Discovery”, IOS Press • IJCAT - International Journal of Computer Applications in Technology, Inderscience Publishers • International Journal On Advances in Life Science, IARIA Journals • IJITDM - International Journal of Information Technology & Decision Making 3 C ONFERENCE AND W ORKSHOP O RGANIZATION Program Committee Membership Paolo Garza served as member of the Program Committee of the following conferences: • IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’14, ASONAM’13, ASONAM’12, ASONAM’11, ASONAM’10) • International Conference on Social Computing and its Applications (SCA’13, SCA’12, SCA’11) • International Conference on Business Intelligence Systems (miproBIS’14, miproBIS’13) • International Conference on Advances in Information Mining and Management (IMMM’14, IMMM’13) • International Conference on Social Eco-Informatics (SOTICS’13, SOTICS’12, SOTICS’11). • IADIS European Conference on Data Mining (ECDM’14, ECDM’13, ECDM’12) • IADIS International Conference on Internet Technologies & Society (ITS’12) • European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’11) • ADBIS Special session on Big Data: New Trends and Applications (BiDaTA’13). • International Conference on Data Analytics (DATA ANALYTICS’14, DATA ANALYTICS’13) Paolo Garza served as member of the Program Committee of the following workshops: • International Workshop on Social Network Analysis and Mining using Digital Technology (SNAMDT’14) • International Workshop on Behavior and Social Informatics (BSI’13), in conjunction with PAKDD’13 • International Workshop on Behavior and Social Informatics and Computing (BSIC13) • International Workshop of Social Knowledge Discovery and Utilization (SKDU’12), in conjunction with IEEE/ACM ASONAM’12 • International Workshop on Flexible Database and Information System Technology (FlexDBIST’11, FlexDBIST’10, FlexDBIST’09) Paolo Garza served as external reviewer for the following conferences: • International Conference on Data Engineering - Demo session (ICDE’14) • IEEE Conference on Data Mining (ICDM’13, ICDM’11, ICDM’10) • ACM Symposium on Applied Computing (SAC’14, SAC’12, SAC’10, SAC’09, SAC’08, SAC’07, SAC’06, SAC’05, SAC’04) • Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’12, PAKDD’11) • International Conference on Data Wrehousing and Knowledge Discovery (DaWak’13, DaWaK’11, DaWaK’09, DaWaK’08, DaWaK’06, DaWaK’05, DaWaK’04, DaWaK’03, DaWaK’02) • European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’12, PKDD’05, PKDD’04, PKDD’03, PKDD’02) • Convegno Nazionale su Sistemi Evoluti per Basi di Dati (SEBD’12, SEBD’09, SEBD’07, SEBD’06, SEBD’04, SEBD’03, SEBD’02) 4 Complete publication list P UBLICATION LIST Refereed international journals Refereed chapters in international books Refereed international conferences Referred international workshops Refereed national conferences 15 6 12 4 4 R EFEREED INTERNATIONAL JOURNALS ACM Transactions and IEEE Transactions JR.1. Luca Cagliero, Paolo Garza, “Infrequent Weighted Itemset Mining using Frequent Pattern Growth”, IEEE Transactions on Knowledge and Data Engineering (TKDE), In press (ISSN: 1041-4347) [doi: http://dx.doi.org/10.1109/TKDE.2013.69] JR.2. Elena Baralis, Luca Cagliero, Paolo Garza, “EnBay: A Novel Pattern-Based Bayesian Classifier”, IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol. 25, no. 12, 2013, pp. 2780-2795 (ISSN: 1041-4347) [doi: http://dx.doi.org/10.1109/TKDE.2012.197] JR.3. Elena Baralis, Silvia Chiusano, Paolo Garza, “A Lazy Approach to Associative Classification”, IEEE Transactions on Knowledge and Data Engineering (TKDE) Vol. 20, no. 2, 2008, pp. 156-171 (ISSN: 1041-4347) [doi: http://dx.doi.org/10.1109/TKDE.2007.190677] JR.4. Elena Baralis, Paolo Garza, Elisa Quintarelli, Letizia Tanca, “Answering XML Queries by means of Data Summaries”, ACM Transactions on Information Systems (TOIS) Vol. 25, no. 3, 2007, pp. 1-33 (ISSN: 1046-8188) [doi: http://dx.doi.org/10.1145/1247715.1247716] Other International journals JR.5. Luca Cagliero, Tania Cerquitelli, Paolo Garza, Luigi Grimaudo “Twitter Data Analysis by means of Strong Flipping Generalized Itemsets” has been accepted for publication in Journal of Systems and Software. “Misleading Generalized Itemset Discovery”, Journal of Systems and Software (JSS), In press (ISSN: 0164-1212) [doi: http://dx.doi.org/10.1016/j.eswa.2013.08.039] JR.6. Elena Baralis, Luca Cagliero, Tania Cerquitelli, Vincenzo D’Elia,Paolo Garza, “ Expressive Generalized Itemsets”, Information Sciences (INS), In press (ISSN: 0020-0255) [doi: http://dx.doi.org/10.1016/j.ins.2014.03.056] JR.7. Luca Cagliero, Tania Cerquitelli, Paolo Garza, Luigi Grimaudo “Misleading Generalized Itemset Discovery”, Expert Systems With Applications (ESWA), Vol. 41, Issue 4, Part 1, 2014, pp. 14001410 (ISSN: 0957-4174) [doi: http://dx.doi.org/10.1016/j.eswa.2013.08.039] JR.8. Luca Cagliero, Paolo Garza, “Itemset generalization with cardinality-based constraints”, Information Sciences (INS), Vol. 244, 2013, pp. 161174 (ISSN: 0020-0255) [doi: http://dx.doi.org/10.1016/j.ins.2013.05.008] JR.9. Luca Cagliero, Paolo Garza, “Improving classification models with taxonomy information”, Data & Knowledge Engineering (DKE), Vol. 86, 2013, pp. 85101 (ISSN: 0169-023X) [doi: http://dx.doi.org/10.1016/j.datak.2013.01.005] JR.10. Elena Baralis, Luca Cagliero, Tania Cerquitelli, Paolo Garza, “Generalized association rule mining with constraints”, Information Sciences (INS), Vol. 194, 2012, pp. 68-84 (ISSN: 0020-0255) [doi: http://dx.doi.org/10.1016/j.ins.2011.05.016] JR.11. Elena Baralis, Paolo Garza, “I-prune: Item Selection for Associative Classification”, International Journal of Intelligent Systems (IJIS), Vol. 27, no. 3, 2012, pp. 279-299 (ISSN: 0884-8173) [doi: http://dx.doi.org/10.1002/int.21524] JR.12. Luca Cagliero, Tania Cerquitelli, Paolo Garza, “Semi-automatic ontology construction by exploiting functional dependencies and association rules”, International Journal on Semantic Web and Information Systems (IJSWIS), Vol. 7, no. 2, 2011, pp. 1-22 (ISSN: 1552-6283) [doi: http://dx.doi.org/10.4018/jswis.2011040101] 5 JR.13. Elena Baralis, Luca Cagliero, Tania Cerquitelli, Paolo Garza, Marco Marchetti, “CAS-MINE: Providing personalized services in context-aware applications by means of generalized rules”, Knowledge and Information Systems journal (KAIS), Vol. 28, no. 2, 2011, pp. 283-310 (ISSN: 0219-3116) [doi: http://dx.doi.org/10.1007/s10115-010-0359-z] JR.14. Bruno Giulia, Paolo Garza, “TOD: Temporal Outlier Detection by using quasi-functional temporal dependencies”, Data & Knowledge Engineering (DKE) Vol. 69, 2010, pp. 619-639 (ISSN: 0169-023X) [doi: http://dx.doi.org/10.1016/j.datak.2010.02.003] JR.15. Giulia Bruno, Paolo Garza, Elisa Quintarelli, Rosalba Rossato, “Anomaly Detection Through Quasi-Functional Dependency Analysis”, Journal of Digital Information Management (JDIM), Special Issue on Advances in Querying NonConventional Data Sources Vol. 5, no. 4, 2007, pp. 191-200 (ISSN: 0972-7272) R EFEREED CHAPTERS IN INTERNATIONAL BOOKS IB.1. Giulia Bruno, Paolo Garza, “Temporal Pattern Mining for Medical Applications”, in Data Mining: Foundations and Intelligent Paradigms, pp. 9-18, 2012 (ISBN: 978-3-642-23150-6) [doi: http://dx.doi.org/10.1007/978-3-642-23151-3_2] IB.2. Luca Cagliero, Tania Cerquitelli, Paolo Garza, “Discovering higher level correlations from XML data”, in XML Data Mining: Models, Methods, and Applications, pp. 288-315, 2012 (ISBN: 978-1-6135-0356-0) [doi: http://dx.doi.org/10.4018/978-1-61350-356-0.ch013] IB.3. Devis Bianchini, Paolo Garza, Elisa Quintarelli, “Semantic-enriched data mining techniques for intensional service representation”, in Management of the Interconnected World, pp. 167-174, 2010 (ISBN: 978-3-7908-2403-2) [doi: http://dx.doi.org/10.1007/978-3-7908-2404-9_20] IB.4. Silvia Chiusano, Paolo Garza, “Selection of High Quality Rules in Associative Classification”, in Post-Mining of Association Rules: Techniques for Effective Knowledge Extraction, pp. 173-198, 2009 (ISBN: 978-1-60566-404-0) [doi: http://dx.doi.org/10.4018/978-1-60566-404-0.ch010] IB.5. Giulia Bruno, Paolo Garza, Elisa Quintarelli, “Mining rare association rules by discovering quasi-functional dependencies: an incremental approach”, in Rare Association Rule Mining and Knowledge Discovery: Technologies for Infrequent and Critical Event Detection, pp. 131-149, 2009 (ISBN: 978-1-60566-754-6) [doi: http://dx.doi.org/10.4018/978-1-60566-754-6.ch009] IB.6. Elena Baralis, Paolo Garza, Elisa Quintarelli, Letizia Tanca, “Using mined patterns for XML Query Answering”, in Successes and New Directions in Data Mining, pp. 39-66, 2007 (ISBN: 978-1-59904-645-7) [doi: http://dx.doi.org/10.4018/978-1-59904-645-7.ch003] R EFEREED INTERNATIONAL CONFERENCES IC.1. Paolo Garza, Paolo Margara, Nicoló Nepote, Luigi Grimaudo, and Elio Piccolo, “Hadoop on a Low-Budget General Purpose HPC Cluster in Academia”, ADBIS Special session on Big Data: New Trends and Applications (BiDaTA’13), 2014, pp. 187-192. [doi: http://dx.doi.org/10.1007/978-3-319-01863-8_21] IC.2. Elena Baralis, Luca Cagliero, Tania Cerquitelli, Silvia Chiusano, and Paolo Garza, “Frequent Weighted Itemset Mining from Gene Expression Data”, Proc. IEEE Int. Conf. on BioInformatics and BioEngineering, (BIBE’13), In press. IC.3. Cristiana Bolchini, Paolo Garza, Elisa Quintarelli, and Fabio Salice, “A Data Mining Approach to Incremental Adaptive Functional Diagnosis”, Proc. IEEE Int. Symp. on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, (DFT’13), 2013, pp. 13-18. [doi: http://dx.doi.org/10.1109/DFT.2013.6653576] IC.4. Paolo Garza, “Structured Data Classification by means of Matrix Factorization”, Proc. ACM Int. Conf. on Information and Knowledge Management, (CIKM’11), 2011, pp. 2165-2168. [doi: http://doi.acm.org/10.1145/2063576.2063917] IC.5. Elena Baralis, Luca Cagliero, Tania Cerquitelli, Vincenzo D’Elia,Paolo Garza, “Support driven opportunistic aggregation for generalized itemset extraction”, Proc. IEEE Int. Conf. on Intelligent Systems (IS’10), 2010, pp. 102-107. [doi: http://dx.doi.org/10.1109/IS.2010.5548348] IC.6. Elena Baralis, Luca Cagliero, Tania Cerquitelli, Paolo Garza, Marco Marchetti, “Context-Aware User and Service Profiling by means of Generalized Association Rules”, Proc. Int. Conf. on Knowledge-based and Intelligent Information & Engineering Systems (KES’09), 2009, pp. 50-57. [doi: http://dx.doi.org/10.1007/978-3-642-04592-9_7] 6 IC.7. Elena Baralis, Marco Cabutto, Tania Cerquitelli, Antonio Garofalo, Paolo Garza, “Soft real-time view management”, Proc. IEEE Int. Conf. on Intelligent Systems (IS’08), 2008, pp. 23-28. [doi: http://dx.doi.org/10.1109/IS.2008.4670549] IC.8. Elena Baralis, Paolo Garza, “Associative Text Classification exploiting Negated Words”, Proc. ACM Symposium on Applied Computing (SAC’06), 2006, pp. 530-535. [doi: http://doi.acm.org/10.1145/1141277.1141402] IC.9. Elena Baralis, Silvia Chiusano, Paolo Garza, “On Support Thresholds in Associative Classification”, Proc. ACM Symposium on Applied Computing (SAC’04), 2004, pp. 553-558. [doi: http://doi.acm.org/10.1145/967900.968016] IC.10. Elena Baralis, Paolo Garza, “Majority Classification by Means of Association Rules”, Proc. European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD’03), 2003, pp. 35-46. [doi: http://dx.doi.org/10.1007/978-3-540-39804-2_6] IC.11. Carla F. Chiasserini, Marco Ajmone Marsan, Elena Baralis, Paolo Garza, “Towards Feasible Topology Formation Algorithms for Bluetooth-based WPANs”, Proc. IEEE Int. Conf. on System Sciences (HICSS’03), 2003, pp. 313-322. [doi: http://dx.doi.org/10.1109/HICSS.2003.1174873] IC.12. Elena Baralis, Paolo Garza, “A Lazy Approach to Pruning Classification Rules”, Proc. IEEE Int. Conf. on Data Mining (ICDM’02), 2002, pp. 35-42. [doi: http://dx.doi.org/10.1109/ICDM.2002.1183883] R EFERRED INTERNATIONAL WORKSHOPS WS.1. Cristiana Bolchini, Paolo Garza, Elisa Quintarelli, “Board-level functional fault diagnosis using data mining”, Second Workshop on Manufacturable and Dependable Multicore Architectures at Nanoscale (MEDIAN’13), 2013. WS.2. Paolo Cremonesi, Paolo Garza, Elisa Quintarelli, Roberto Turrin, “Top-N recommendations on Unpopular Items with Contextual Knowledge”, Workshop on Context-aware Recommender Systems (CARS’11), in conjunction with RecSys’11, 2011, pp. 1-5. WS.3. Giulia Bruno, Paolo Garza, Elisa Quintarelli, Rosalba Rossato, “Anomaly Detection in XML databases by means of Association Rules”, Workshop on Flexible Database and Information Systems Technology (FlexDBIST’07), 2007, pp. 387-391. [doi: http://dx.doi.org/10.1109/DEXA.2007.38] WS.4. Elena Baralis, Paolo Garza, Elisa Quintarelli, Letizia Tanca, “Summarizing XML Data by Means of Association Rules”, Current Trends in Database Technology - EDBT 2004 Workshops (DataX’04), 2004, pp. 260-269. [doi: http://dx.doi.org/10.1007/978-3-540-30192-9_25] R EFEREED NATIONAL CONFERENCES NC.1. Elena Baralis, Luca Cagliero, Tania Cerquitelli, Paolo Garza, Marco Marchetti, “Generalized Association Rules to Support Context-aware User and Service Profiling”, Proc. Italian Symposium on Advanced Database Systems (SEBD’09), 2009, pp. 101-108. NC.2. Giulia Bruno, Paolo Garza, Elisa Quintarelli, Rosalba Rossato, “RADAR: Research of Anomalous Data through Association Rules”, Proc. Italian Symposium on Advanced Database Systems (SEBD’07), 2007, pp. 290-297. NC.3. Elena Baralis, Paolo Garza, Elisa Quintarelli, Letizia Tanca, “Answering Queries on XML Data by means of Association Rules”, Proc. Italian Symposium on Advanced Database Systems (SEBD’04), 2004, pp. 262-269. NC.4. Elena Baralis, Paolo Garza, “Accurate Classification by means of Association Rules”, Proc. Italian Symposium on Advanced Database Systems (SEBD’02), 2002, pp. 273-280. 7