Abstract Interpretation: concrete and abstract semantics Concrete semantics • We consider a very tiny language that manages arithmetic operations on integers values. • The (concrete) semantics of the languages cab be defined by the funzcion µ defined by: e i |e ∗e = µ : Exp → Int µ (i ) µ (e1 ∗ e2 ) = = i µ (e1 ) ×µ (e2 ) 2 Abstract Semantics • Consider now an abstract semantics over the domain of signs σ :Exp → {+,-,0} σ= (i ) σ (e1 ∗ e= 2) + if i > 0 i = 0 if 0 − if i < 0 σ (e1 ) × σ (e2 ) × + 0 − + + 0 − 0 0 0 0 − − 0 + 3 From a different perspective • We can associate to each abstract value the set of concrete elements it represents. • The concretization function : γ : { +, 0, −} → 2Int γ ( +)= γ (0) = {i | i > 0} γ ( −)= {i | i < 0} {0} 4 Concretization • The concretization function γ maps an abstract value to a set of concrete elements • Let D denote the comncrete domain and A denote the abstract domain. The correctness of the abstract semantics wrt the concrete one can be expressed by: σ µ (e ) ∈ γ (σ (e )) Exp µ A γ D 2 5 Abstract Interpretation • Abstract Interpretation is: – Computing the semantics of a program in an abstract domain – In the case of signs, the domain so far is {+,0,-}. • The abstract semantics should be correct – it is an over approximation of the concrete semantics • The relatrion between te two domains is given by a concretization function 6 Consider the unary operator • Let us add to our language the unary operator - µ ( −e ) =− µ (e ) σ ( −e ) =−σ (e ) − + 0 − − 0 + 7 Consider the binary operation + • Adding the addition operator focrces us to modify the domain, as the previous one is not able to represent the result of adding numbers of opposite sign + + 0 − µ (e1 + e2 ) = µ (e1 ) + µ (e2 ) + + + ? σ (e1 + e2 ) = σ (e1 ) + σ (e2 ) 0 + 0 − − ? − − 8 So… • We add to the domain a new element that represents all the integer numbers (both positive and negative, and zero) + + γ (T) = Int + + 0 − T + T T 0 + 0 − T − T − − T T T T T T 9 The operations should be revisited × + 0 − + + 0 0 − T − T 0 0 0 0 − 0 + T T T 0 T T − + 0 − T − 0 + T 10 Examples Sometimes there is information loss due to the abstract operations µ ((1 + 2) + −3) = 0 σ ((1 + 2) + −3) = ( + + + ) + ( −+ ) = T Sometimes there is no information loss, with respect to the abstraction µ ((5 ∗ 5) + 6) =31 σ ((5 ∗ 5) + 6) = ( + × +) + + = + 11 Consider the division operator / • Problem: what is the result of dividing by zero? No number! • So we need a new element in our domain that represents the empty set of integers (i.e. a failure state) γ ( ⊥ ) =∅ / + 0 − T + 0 + 0 − T ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ − − 0 + T ⊥ T T 0 T T ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ x = ⊥ × ⊥ = ⊥ = ⊥ ⊥ + x − ⊥ 12 The resulting abstract domain • It is a finite complete lattice T • The partial order is coherent wrt the concretization function: x ≤ y ⇔ γ (x ) ⊆ γ ( y ) + 0 − ⊥ 13 The abstraction function • The concretization function γ has an adjoint function, the abstraction function α. • Function α maps a set of concrete values into the best representation of this set in the abstract domain (the smaller element f the abstract domain that represents of of these elements) • In our example:, α : 2Int → A α (S= ) lub ({− | i < 0 ∧ i ∈ S }, {0 | 0 ∈ S }, {+ | i > 0 ∧ i ∈ S }) σ (i ) = α ({i }) 14 A general definition • An Abstract Interpretation consists of: – An abstract domain A and a concrete domain D – A and D are complete lattices . Smaller means “more precise” – Two monotone adjoint function that enjoy che formino una inserzione di Galois. – Abstract operations that are correct wrt the concrete ones – A fixpoint algorithm • Galois insertion: ∀x ∈ 2D . x ⊆ γ (α (x )) ∀a ∈ A. x = α (γ (x )) 15 Correctness revisited • If case of Galois insertion, these correctness conditions are equivalent (prove it !) µ (e ) ∈ γ (σ (e )) σ (e ) ≥ α ({ µ (e )}) σ A ≤ γ α Exp D µ∈2 16 Correctness • We show that in order to ensure the correctness of the whole analysis the following conditions are sufficient: 1. The function α and γ are monotone 2. The function α and γ form a Galois insertion 3. The abstract operations are locally correct, i.e. γ (op(s1 ,..., sn )) ⊇ op(γ (s1 ),..., γ (sn )) • Notice that there is always a way to define a locally correct abstract operation. It is sufficient to consider the operations that returns the top element of the abstract domain. 17 Local correctness γ (op(s1 ,..., sn )) A op A γ γ 2 D op 2 D op(γ (s1 ),..., γ (sn )) 18 Correctness proof • We show by structural induction on e that: µ (e ) ∈ γ (σ (e )) • Basic step: = ∈ ⊆ = µ (i ) i {i } γ (α ({i })) γ (σ (i)) def. di µ inserzione di Galois def di σ 19 Correctness proof Inductive Step µ (e ) ∈ γ (σ (e )) µ (e1 op e2 ) = µ (e1 ) op µ (e2 ) def. di µ ∈ γ (σ (e1 )) op γ (σ (e2 )) per ipotesi induttiva ⊆ γ (σ (e1 ) op σ (e2 )) = γ (σ (e1 op e2 )) correttezza locale def di σ 20 Adding an input • We can extend our tiny language with the possibility to get an input value from the user • This means that we have a variable x in the expressions e = i | e ∗ e | −e | ... | x 21 Concrete semantics • The semantic function µ becomes µ : Exp → Int → Int • And we may express it in terms of a family of functions, having expressions as indeces and a single parameter (the input value) µi ( j ) = i µx ( j ) = j µe= µe ( j ) ∗µe ( j ) ∗e ( j ) µe= µe ( j ) + µe ( j ) +e ( j ) 1 2 1 2 1 2 1 2 ... = ... 22 Abstract semantics • The same holds for the abstract semantic function σ σ : Exp → A → A • Also in this case we can express σ by a family of functions: σi ( j ) = i σx (j ) = j σ e= ∗e ( j ) σe (j ) ∗ σe (j ) σ e= +e ( j ) σe (j ) + σe (j ) ... i 1 2 1 2 = ... 1 2 1 2 = α ({i }) 23 Correctness • The following conditions are equivalent ∀i . µe (i ) ∈ γ (σ e (α ({i }))) µe ≤D γ σ e α A σe A α γ α α µe ≤A σ e α D 2 ≤ µe ∈ 2D 24 Local correctness • We can express the local correntess condition by: op (γ (σ e ( j )),..., γ (σ en ( j )) ) ⊆ γ (op (σ e ( j ),..., σ en ( j )) ) 1 1 25 Conditional statement e = ... | if e = e then e else e | ... • Concrete semantics µe3 (i ) if µe1 (i ) = µe2 (i ) µif e1 =e2 then e3 else e4 (i ) = µe4 (i ) if µe1 (i ) ≠ µe2 (i ) • Abstract semantics σ if e =e 1 2 then e3 else e4 (i ) = σ e3 (i ) σ e (i ) 4 • Notice the role of the lub in the abstract domain 26 Correctness of the conditional statm. • Assume that the condition is true (the other case is analogous) µe (i ) 3 ∈ γ (σ e3 (i )) ⊆ γ (σ e3 (i )) ( per induzione γ (σ e (i )) 4 ) σ e4 (i ) ⊆ γ σ e3 (i ) monotonia di γ = γ (σ (i )) if e1 =e2 then e3 else e4 27 Recursion • As a final step, we add recursive functions (on a single paramether) = program = def f (x ) e e = ... | f (e ) • Until now, the concrete semantics was defined as: µ : Exp → Int → Int⊥ 28 Concrete semantics (function calls) • In order to take into account the call of functions, the signature of µ becomes as folllows: µ ′ : Exp → (Int → Int⊥ ) → Int → Int⊥ µf′ (e ) ( g )( j ) = g ( µe′ ( g )( j )) µx′ ( g )( j ) =j µe′ += µe′ ( g )( j ) + µe′ ( g )( j ) e ( g )( j ) 1 2 1 2 29 Semantics of recursive functions µ ′ : Exp → (Int → Int⊥ ) → Int → Int⊥ Consideriamo una funzione def f = e Definiamo una catena ascendente f0 ,f1 ,... in Int → Int⊥ f0 λx . ⊥ = fi +1 = µe′ (fi ) Definiamo µf = fi i 30 def f = if x=0 then 1 else f(x - 1) f0(i) = ⊥ for every i f1(i) = µ’if x=0 then 1 else f(x - 1)(f0)(i) = µ’1(f0)(i) = 1 if µ’f(x - 1)(f0)(i) otherwise i=0 = f0 (µ’x - 1(f0)(i)) = f0 (µ’x (f0)(i)) = f0 (i) - f0 (µ’1 (f0)(i)) - f0 (1) =⊥ 31 def f = if x=0 then 1 else f(x - 1) f0(i) = ⊥ for each i f1(i) = 1 if i=0, ⊥ otherwise f2(i) = µ’if x=0 then 1 else f(x - 1)(f1)(i) = µ’1(f1)(i)=1 if µ’f(x - 1)(f1)(i) otherwise i=0 = f1 (µ’x - 1(f1)(i)) = f1 (µ’x (f1)(i)) - f1 (µ’1 (f1)(i)) =… 32 def f = if x=0 then 1 else f(x - 1) f0(i) = ⊥ for every i f1(i) = 1 if i=0, ⊥ otherwise f2(i) = 1 if i=0,1, ⊥ otherwise f3(i) = 1 if i=0,1,2, ⊥ otherwise f4(i) = 1 if i=0,1,2,3, ⊥ otherwise ... µ(f) = Ui>=0 fi 33 Abstract Semantics • Tn the same way, we need to extend the definition of the abstract semantics σ. • We require that all the operations are monotone. σ ′ : Exp → (A → A) → A → A σ f′ (e ) ( g )(i ) = g (σ e′ ( g )(i )) σ x′ ( g )(i ) =i σ e′ += σ e′ ( g )(i ) + σ e′ ( g )(i ) e ( g )(i ) 1 2 1 2 34 Abstract semantics of recursion σ ': Exp → (A → A) → A → A Consideriamo una funzione def f = e Definiamo una catena ascendente f 0 ,f 1 ,... in A → A f= λa . ⊥ 0 f i +1 = σ e′ (f i ) Definiamo σ f = f i i 35 Correctness f2 ( j ) ≤ f1 ( j ) ≤ f0 ( j ) γ γ γ f2 ( j ) ≤ f 1(j ) ≤ f 0(j ) • The abstract chain always covers the corresponding element in the concrete chain 36 Correctness (ctd.) ∀i . fi ( j ) ∈ γ (f i ( j )) ⇒ fi ( j ) ∈ γ (f i ( j )) i i ≥0 le catene convergono ≥0 ⇒ fi ( j ) ∈ γ f i ( j ) monotonia di γ i ≥0 i ≥0 ⇒ µf ( j ) ∈ γ (σ f ( j )) per definizione 37 Summary • Abstract Interpretation means – – – – – Define a concrete (collecting) smenatics Define an abstract semantics (domain and operations) Prove the local correctness of the operations Apply a fixpoint algorithm to compute the abstract semantics Use a widening operator to ensure convergence (if the ascending chain condition does not hold) 38