Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Course on Data Mining (581550-4) Intro/Ass. Rules 7.11. 24./26.10. Clustering 14.11. Episodes KDD Process Home Exam 30.10. Text Mining 21.11. 28.11. Appl./Summary Course on Data Mining Page 1/54 1 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Course on Data Mining (581550-4) Today 31.10.2001 • Today's subject: – Episodes and episode rules • Next week's program: – Lecture: – Exercise: – Seminar: Text mining Episodes and episode rules Episodes and episode rules Course on Data Mining Page 2/54 2 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Episodes and Episode Rules Basics WINEPI Approach MINEPI Approach Algorithms Examples Course on Data Mining Page 3/54 3 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Basics • Association rules describe how things occur together in the data – E.g., "IF an alarm has certain properties, THEN it will have other given properties" • Episode rules describe temporal relationships between things – E.g., "IF a certain combination of alarms occurs within a time period, THEN another combination of alarms will occur within a time period" Course on Data Mining Page 4/54 4 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Basics Network Management System MSC MSC MSC BSC BSC BSC Switched Network Access Network Alarms BTS BTS BTS MSC Mobile station controller BSC Base station controller BTS Base station transceiver Course on Data Mining Page 5/54 5 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Basics • As defined earlier, telecom data contains alarms: 1234 EL1 PCM 940926082623 A1 ALARMTEXT.. Alarm type Date, time Alarming network element Alarm number Alarm severity class • Now we forget about relationships between attributes within alarms as with the association rules • We just take the alarm number attribute, handle it here as event/alarm type and inspect the relationships between events/alarms Course on Data Mining Page 6/54 6 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Basics • Data: – Data is a set R of events – Every event is a pair (A, t), where • A R is the event type (e.g., alarm type) • t is an integer, the occurrence time of the event – Event sequence s on R is a triple (s, Ts, Te) • Ts is starting time and Te is ending time • Ts < Te are integers • s = (A1, t1), (A2, t2), …, (An, tn) • Ai R and Ts ti < Te for all i=1, …, n Course on Data Mining Page 7/54 7 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Basics • Example alarm data sequence: D C A 0 10 B 20 30 40 D A B C A D C A B D A 50 60 70 80 90 100 110 120 130 140 150 • Here: – A, B, C and D are event (or here alarm) types – 10…150 are occurrence times – s = (D, 10), (C, 20), …, (A, 150) – Ts (starting time) = 10 and Te (ending time) = 150 • Note: There needs not to be events on every time slot! Course on Data Mining Page 8/54 8 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Basics • Episodes: – An episode is a pair (V, ) • V is a collection of event types, e.g., alarm types • is a partial order on V – Given a sequence S of alarms, an episode = (V, ) occurs within S if there is a way of satisfying the event types (e.g., alarm types) in V using the alarms of S so that the partial order is respected – Intuitively: episodes consist of alarms that have certain properties and occur in a certain partial order Course on Data Mining Page 9/54 9 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Basics • The most useful partial orders are: – Total orders • The predicates of each episode have a fixed order • Such episodes are called serial (or "ordered") – Trivial partial orders • The order of predicates is not considered • Such episodes are called parallel (or "unordered") • Complicated? – Not really, let's take some clarifying examples Course on Data Mining Page 10 10/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Basics • Examples: A B A A C B Serial episode Parallel episode B More complex episode with serial and parallel Course on Data Mining Page 11 11/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • The name of the WINEPI method comes from the technique it uses: a sliding window • Intuitively: – A window is slided through the event-based data sequence – Each window "snapshot" is like a row in a database – The collection of these "snapshots" forms the rows in the database • Complicated? – Not really, let's take a clarifying example Course on Data Mining Page 12 12/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • Example alarm data sequence: D C A 0 10 B 20 30 40 D A B C 50 60 70 80 90 • The window width is 40 seconds, last point excluded • The first/last window contains only the first/last event Course on Data Mining Page 13 13/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • Formally, given a set E of event types an event sequence S = (s,Ts,Te) is an ordered sequence of events eventi such that eventi eventi+1 for all i=1, …, n-1, and Ts eventi < Te for all i=1, …, n event1 event2 event3 … … eventn Ts t1 Te t2 t3 … … tn Course on Data Mining Page 14 14/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • Formally, a window on event sequence S is an event sequence S=(w,ts,te), where ts < Te, te > Ts, and w consists of those pairs (event, t) from s where ts t < te • The value ts t < te is called window width, W event1 event2 event3 … … eventn Ts t1 Te t2 t3 ts W te tn Course on Data Mining Page 15 15/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • By definition, the first and the last windows on a sequence extend outside the sequence, so that the last window contains only the first time point of the sequence, and the last window only the last time point event1 event2 event3 … … eventn Ts ts W tt1e Te t2 t3 tn ts W te Course on Data Mining Page 16 16/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • The frequency (cf. support with association rules) of an episode is the fraction of windows in which the episode occurs, i.e., fr(, S, W) = |Sw W(S, W) | occurs in Sw | |W(S, W)| where W(S, W) is the set of all windows Sw of sequence S such that the window width is W Course on Data Mining Page 17 17/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • When searching for the episodes, a frequency threshold (cf. support threshold with association rules) min_fr is used • Episode is frequent if fr(, s, win) min_fr, i.e, "if the frequency of exceeds the minimum frequency threshold within the data sequence s and with window width win" • F(s, win, min_fr): a collection of frequent episodes in s with respect to win and min_fr • Apriori trick holds: if an episode is frequent in an event sequence s, then all subepisodes are frequent Course on Data Mining Page 18 18/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • Formally, an episode rule is as expression , where and are episodes such that is a subepisode of • An episode is a subepisode of ( ), if the graph representation is a subgraph of the representation of A : A : B C B Course on Data Mining Page 19 19/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • The fraction fr(, S, W) fr(, S, W) = frequency of the whole episode = frequency of the LHS episode is the confidence of the WINEPI episode rule • The confidence can be interpreted as the conditional probability of the whole of occurring in a window, given that occurs in it Course on Data Mining Page 20 20/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • Intuitively: – WINEPI rules are like association rules, but with an additional time aspect: If events (alarms) satisfying the rule antecedent (lefthand side) occur in the right order within W time units, then also the rule consequent (right-hand side) occurs in the location described by , also within W time units antecedent consequent [window width] (f, c) Course on Data Mining Page 21 21/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Algorithm • Input: A set R of event/alarmtypes, an event sequence s over R, a set E of episodes, a window width win, and a frequency threshold min_fr • Output: The collection F(s, win, min_fr) • Method: 1. compute C1 := { E | || = 1}; 2. i := 1; 3. while Ci do 4.(* compute F(s, win, min_fr) := { Ci | fr(, s, win) min_fr}; 5. i := l+1; 6.(** compute Ci:= { E | || = I, and F||(s, win, min_fr) for all E, }; (* = database pass, (** candidate generation Course on Data Mining Page 22 22/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Algorithm • First problem: given a sequence and a episode, find out whether the episode occurs in the sequence • Finding the number of windows containing an occurrence of the episode can be reduced to this • Successive windows have a lot in common • How to use this? – An incremental algorithm – Same idea as for association rules – A candidate episode has to be a combination of two episodes of smaller size – Parallel episodes, serial episodes Course on Data Mining Page 23 23/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Algorithm • Parallel episodes: – For each candidate maintain a counter .event_count: how many events of are present in the window – When .event_count becomes equal to ||, indicating that is entirely included in the window, save the starting time of the window in .inwindow – When .event_count decreases again, increase the field .freq_count by the number of windows where remainded entirely in the window • Serial episodes: use a state automata Course on Data Mining Page 24 24/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • Example alarm data sequence: D C A 0 10 B 20 30 40 D A B C 50 60 70 80 90 • The window width is 40 secs, movement step 10 secs • The length of the sequence is 70 secs (10-80) Course on Data Mining Page 25 25/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • By sliding the window, we'll get 11 windows (U1-U11): … U1 U2 U11 D C A 0 10 B 20 30 40 D A B C 50 60 70 80 90 • Frequency threshold is set to 40%, i.e., an episode has to occur at least in 5 of the 11 windows Course on Data Mining Page 26 26/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach Course on Data Mining Page 27 27/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • Suppose that the task is to find all parallel episodes: – First, create singletons, i.e., parallel episodes of size 1 (A, B, C, D) – Then, recognize the frequent singletons (here all are) – From those frequent episodes, build candidate episodes of size 2: AB, AC, AD, BC, BD, CD – Then, recongize the frequent parallel episodes (here all are) – From those frequent episodes, build candidate episodes of size 3: ABC, ABD, ACD, BCD – When recognizing the frequent episodes, only ABD occurs in more than four windows – There are no candidate episodes of size four Course on Data Mining Page 28 28/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • Episode frequencies and example rules with WINEPI: D C A B DC DA DB CA CB AB DAB : 73% : 73% : 64% : 64% : 45% : 55% : 45% : 45% : 45% : 45% : 45% D A [40] (55%, 75%) D A B [40] (45%, 82%) Course on Data Mining Page 29 29/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI: Experimental Results • Data: – Alarms from a telecommunication network – 73 000 events (7 weeks), 287 event types – Parallel and serial episodes – Window widths (W) 10-120 seconds – Window movement = W/10 – min_fr = 0.003 (0.3%), frequent: about 100 occurrences – 90 MHz Pentium, 32MB memory, Linux operating system. The data resided in a 3.0 MB flat text file Course on Data Mining Page 30 30/54 Mika Klemettinen and Pirjo Moen Nokia Research Center TypeYourNameHere University of Helsinki/Dept of CS DOCUMENTTYPE Autumn 2001 1 (1 TypeDateHere WINEPI: Experimental Results Window width (s) 10 20 40 60 80 100 120 Serial episodes Parallel episodes #frequent time (s) #frequent time (s) 16 31 10 8 31 63 17 9 57 117 33 14 87 186 56 15 145 271 95 21 245 372 139 21 359 478 189 22 Course on Data Mining Page 31 31/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 WINEPI Approach • One shortcoming in WINEPI approach: – Consider that two alarms of type A and one alarm of type B occur in a window – Does the parallel episode consisting of A and B appear once or twice? – If once, then with which alarm of type A? D C A 0 10 B 20 30 40 D A B C 50 60 70 80 90 Course on Data Mining Page 32 32/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI Approach • Alternative approach to discovery of episodes – No sliding windows – For each potentially interesting episode, find out the exact occurrences of the episode • Advantages: easy to modify time limits, several time limits for one rule: "If A and B occur within 15 seconds, then C follows within 30 seconds" • Disadvantages: uses a lots of space Course on Data Mining Page 33 33/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI Approach • Formally, given a episode and an event sequence S, the interval [ts,te] is a minimal occurrence of S, – If occurs in the window corresponding to the interval – If does not occur in any proper subinterval • The set of minimal occurrences of an episode in a given event sequence is denoted by mo(): mo() = { [ts,te] | [ts,te] is a minimal occurrence of } Course on Data Mining Page 34 34/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI Approach • Example: Parallel episode consisting of event types A and B has three minimal occurrences in s: {[30,40], [40,60], [60,70]}, has one occurrence in s: {[60,80]} A : : B D C A 0 10 A C B B 20 30 40 D A B C 50 60 70 80 90 Course on Data Mining Page 35 35/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI Approach • Informally, a MINEPI episode rule gives the conditional probability that a certain combination of events (alarms) occurs within some time bound, given that another combination of events (alarms) has occurred within a time bound • Formally, an episode rule is [win1] [win2] • and are episodes such that ( is a subepisode of ) • If episode has a minimal occurrence at interval [ts,te] with te - ts win1, then episode occurs at interval [ts,t'e] for some t'e such that t'e - ts win2 Course on Data Mining Page 36 36/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI Approach • The confidence of the rule [win1] [win2] is the conditional probability that occurs, given that occurs, under the time constraints specified by the rule: |mo()| / |mo()| where |mo()| is the number of minimal occurrences [ts,te] of such that te - ts win1, and |mo()| is the number of such occurrences where there is also an occurrence of within the interval [ts,ts+win2] Course on Data Mining Page 37 37/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI Approach • The frequency of the rule [win1] [win2] is |mo()|, i.e., the number of times the rule holds in the database • Let's take our example data again: – Task: find all serial episodes by using maximum time bound of 40 secs and window sizes 10, 20, 30 and 40 secs. Frequency threshold is set to one occurrence D C A 0 10 B 20 30 40 D A B C 50 60 70 80 90 Course on Data Mining Page 38 38/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI Approach • Find all serial episodes (1/3): – First, create singletons, i.e., episodes of size 1 (A, B, C, D) – While creating the singletons, we also create an occurrence table for them. After this first database pass, we do not have to scan the database anymore, but use the created inverse tables – Then, recognize the frequent singletons (here all are) – From those frequent episodes, build candidate episodes of size 2: AB, BA, AC, CA, AD, DA, BC, CB, BD, DB, CD, DC Course on Data Mining Page 39 39/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI Approach • Find all serial episodes (2/3): – Then, use the inverse table to create minimal occurrences for the candidates. E.g., for AB take all subepisodes, namely A and B, and compute mo(AB) as follows: • Read the first occurrence of A (30-30), and find the first following B (40-40) • Then take the second occurrence of A (60-60), and find the first following B (70-70) • Then continue with BA Course on Data Mining Page 40 40/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI Approach • Find all serial episodes (3/3): – In the recognition phase, we find all episodes frequent and build the candidate episodes of size 3. Again, almost all candidates are frequent – Finally, the same procedure is repeated for candidates of size 4, and episodes DCAB in 10-40, DABC in 50-80, CABD in 20-50, CBDA in 20-60, and BDAC in 40-80 are found to occur – Candidates of size 5 are not found, so the algorithm terminates Course on Data Mining Page 41 41/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Minimal (serial) occurrences + frequencies in example data MINEPI Approach Course on Data Mining Page 42 42/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI Approach IF D THEN C WITH [0] [10] 0.00 (0/2) [0] [20] 0.50 (1/2) [0] [40] 1.00 (2/2) IF D A THEN C WITH [40] [40] 0.50 (1/2) [20] [40] 1.00 (1/1) IF D THEN A C WITH [0] [10] 0.00 (0/2) [0] [40] 0.50 (1/2) IF D C THEN A B WITH [40] [40] 0.50 (1/2) [30] [40] 1.00 (1/1) Course on Data Mining Page 43 43/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI Approach IF D A B THEN C WITH [40] [40] 0.50 (1/2) [30] [40] 1.00 (1/1) • Below are minimal occurrences of the example rules in the example data: DAB, DCAB DC D C A 0 10 DC, DAC, DABC B 20 30 40 DA D A B C 50 60 70 80 90 DA DAB Course on Data Mining Page 44 44/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI: Experimental Results • The same data set as with WINEPI: – Alarms from a telecommunication network – 73 000 events (7 weeks), 287 event types – Serial MINEPI episodes – Time bounds 15-120 seconds – Window movement = W/10 – min_fr = 50-500 – 90 MHz Pentium, 32MB memory, Linux operating system. The data resided in a 3.0 MB flat text file Course on Data Mining Page 45 45/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI: Experimental Results • Number of episodes and rules: Min_fr 50 100 250 500 15,30 1131 617 418 217 111 57 46 21 Time bounds (s) 30,60 60,120 2278 1982 5899 7659 739 642 1676 2191 160 134 289 375 59 49 80 87 15,30,60,120 5899 14205 1676 3969 289 611 80 138 Course on Data Mining Page 46 46/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 MINEPI: Experimental Results • Execution times: Min_fr 50 100 250 500 15,30 158 80 56 50 Time bounds (s) 30,60 60,120 210 274 87 103 56 59 51 51 15,30,60,120 268 104 58 52 Course on Data Mining Page 47 47/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Summary • Episode rule mining: – Based on association rule techniques – Targeted for temporal data • Two approaches: – WINEPI with a sliding window – MINEPI with the search for minimal occurrences • The approaches can be used for different purposes • In the seminar presentations, we will take a look at… – some other approaches towards sequential pattern mining and – incremental sequential pattern mining Course on Data Mining Page 48 48/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 References • Mika Klemettinen, A Knowledge Discovery Methodology for Telecommunication Network Alarm Databases. Report A-1999-1 (PhD Thesis), University of Helsinki, Department of Computer Science, January 1999. ISBN 951-45-8465-1, ISSN 1238-8645. See electronic version at http://www.cs.helsinki.fi/u/mklemett/THESIS/, especially pages 27-49 • H. Mannila, H. Toivonen, and A. I. Verkamo, Discovery of frequent episodes in event sequences, Technical Report C-1997-15, Dept. of Computer Science, University of Helsinki, 1997. See electronic version at http://www.cs.helsinki.fi/research/fdk/datamining/pubs/C1997-15.ps.gz Course on Data Mining Page 49 49/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Course Organization Next Week • Lecture 7.11.: Text mining – Mika gives the lecture • Excercise 8.11.: Episodes and episode rules – Pirjo takes care of you! :-) • Seminar 9.11.: Episodes and episode rules – Mika gives the lecture – 2 group presentations Course on Data Mining Page 50 50/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Seminar Presentations • Seminar presentations: – Articles are given on previous week's Wed – Presentation in an HTML page (around 3-5 printed pages) due to seminar starting: • Can be either a HTML page or a printable document in PostScript/PDF format – 30 minutes of presentation – 5-15 minutes of discussion – Active participation Course on Data Mining Page 51 51/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Seminar Presentations • Seminar presentations: – Try to understand the "message" in the article – Try to present the basic ideas as clearly as possible, use examples – Do not present detailed mathematics or algorithms – Test: do you understand your own presentation? – In the presentation, use PowerPoint or conventional slides Course on Data Mining Page 52 52/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Seminar Presentations/Groups 3-4 Sequential Patterns R. Agrawal, R. Srikant: "Mining Sequential Patterns", ICDE 1995. Incremental Mining F. Masseglia, P. Poncelet and M. Teisseire: "Incremental Mining of Sequential Patterns in Large Databases", BDA'00. Course on Data Mining Page 53 53/54 Mika Klemettinen and Pirjo Moen University of Helsinki/Dept of CS Autumn 2001 Episodes and Episode Rules Thank you for your attention! Thanks to Heikki Mannila for his slides which greatly helped in preparing this lecture! Course on Data Mining Page 54 54/54