Two equivalent problems RMQ on integer arrays RMQ on ±1 integer arrays LCA on a Cartesian Tree Euler tour of the Cartesian tree RMQ on the array of node-levels LCA on general trees RMQ on ±1 integer arrays Euler tour of the Cartesian tree RMQ on the array of node-levels Search for k-mismatches T CCGTACGATCAGTACAGTACAGTACTTTTTTAAACCGGAGACTACA P CCGAACTATC If O(1) O(k) time Problem: Find longest match between P[i,…] and T[j,…] Data Structure 1. Concatenate P and T into a string X = T$P 2. Construct a data structure on X that retrieves FAST the longest match between any pair of suffixes of X LCA or LCP query Suffix Tree & LCA # 0 LCA s i 1 ssi $ ppi# # 4 mississippi# 12 p i 3 2 1 13 7 5 T#P = mississippi#si$ 13 2 1 10 ppi# $ pi# 11 14 8 1 2 4 6 8 10 si ppi# i# ppi# 1 9 6 3 4 Longest match(3,13) Suffix Array & LCP RMQ Lcp RMQ 0 1 1 1 4 0 0 1 0 2 2 1 3 SA 12 11 14 8 5 2 1 10 9 13 7 4 6 3 Longest match(3,13) LCP T#P = mississippi#si 1 2 4 6 8 10 si sippi# sissippi# ssippi# ssissippi# 13 LCP Surprisingly, also LCA RMQ The RMQ problem RMQA(i,j) – returns the index of the smallest element in the subarray A[i..j]. A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 2 12 1 12 13 21 15 14 10 RMQ(2,7) = 3 Trivial solution: Precompute RMQ for every pair of indices. This takes Q(n2) space, and O(1) query time RMQ on a general array A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 25 22 34 7 19 9 26 16 Cartesian Tree 4 0 6 2 1 12 5 7 3 9 8 RMQ(i,j) = LCA(i,j) on Cartesian trees A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 25 22 34 7 19 9 12 26 16 4 0 6 2 1 5 7 3 9 8 Generic RMQ LCA RMQ ±1 LCA(u,v) = shallowest node between u and v during a depth first search traversal of T. Node at the lowest level 0 3 12 1 4 5 6 2 4 7 7 2 1 2 3 2 3 1 Euler tour 5 1 2 1 8 9 3 ±1 array Node Level 3 4 2 3 2 1 2 3 2 1 1 7 1 3 5 6 5 3 11 10 6 LCAT(4,6) = 3 We are left with “RMQ on ±1 array” RMQA(i,j) – returns the index of the smallest element in the subarray A[i..j]. A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 11 12 11 12 13 14 15 14 13 RMQ(2,7) = 3 Recall the trivial solution: Precompute RMQ for every pair of indices. This takes Q(n2) space, and O(1) query time Sparse Table Preprocess sub arrays of len 2k, for every k=0,1,…, log n M(i,j) = index of min value in A[i, i+ 2j -1] A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9] 10 11 12 11 12 13 14 M(2,0)=2 15 14 13 Total space is O(n log n) RMQ query ? M(2,1)=3 M(2,2)=3 M(2,3)=3 Querying the Sparse Table j i a1 ... 2k elements Total space is O(n log n) RMQ query takes O(1) time 2k elements ... k log( j i) Bucketing A’[i] = min in the i-th block of A. B’[i] is the position (index) of that min. A A’ A’[0] ... A’[i] ... A’[2n/logn] ... ... ... log n 2 2n blocks log n … B[0] ... B[i] B[2n/logn] Use the Bucketing A’ A’[0] ... A’[i] ... ... A’[2n/logn] ... ... log n 2 Preprocess A’ for RMQ using SparseTable Space is (2n/log n) * log (2n/log n) = O(n) RMQ queries on A’ take O(1) time Preprocess every block of A’ for border RMQ Space is O(n), border RMQ take O(1) time. RMQ(i,j) takes O(1) time, if i,j lie in distinct blocks In-block RMQ over ±1 arrays i, j RMQX (i, j ) RMQY (i, j ) X 3 4 5 6 Y 0 1 2 3 DX = DY There are O ( n ) +1 +1 5 2 4 5 6 5 4 1 2 3 2 1 +1 -1 -1 +1 +1 -1 2 normalized blocks Set Table[BlockEnc, i, j] = RMQ(i,j) O log n 1 2 -1 O( n ) n log 2 n O(n) Table entries LZ-parsing (gzip) # 0 s i 1 ssi ppi# 4 # <m><i><s><si><ssip><pi> T = mississippi# 1 2 4 6 8 10 1 p si i 3 2 1 ppi# ppi# i# ppi# 11 mississippi# 12 6 pi# 7 8 5 2 1 10 9 4 3 LZ-parsing (gzip) # 0 It is on the path to 6 s i 1 ssi ppi# 4 # 1 p i si 2 1 ppi# 6 pi# 7 8 5 2 <ssip> T = mississippi# 1 2 4 6 8 10 1 10 Leftmost occ =3<6 3 ppi# i# ppi# 11 mississippi# 12 By maximality check only nodes 3 Leftmost occ =3<6 4 9 1. Longest repeated prefix of T[6,...] 2. Repeat is on the left of 6 LZ-parsing (gzip) 0 # min-leaf Leftmost copy s i Parsing: 2 1 1. Scan T ssi ppi# 2. Visit ST and stop when min-leaf ≥ current pos 4 # <m><i><s><si><ssip><pi> T = mississippi# 1 2 4 6 8 10 1 p si i 9 2 1 6 7 2 1 10 3 ppi# pi# 8 5 3 4 ppi# i# ppi# 11 2 mississippi# 12 3 4 9 Precompute the min descending leaf at every node in O(n) time. 3