Lecture notes on Tight-Binding Molecular Dynamics, and Tight-Binding justification of classical potentials Furio Ercolessi Dipartimento di Fisica, Università di Udine email: [email protected] 7 February 2005 (minor revision 4 November 2010) Introduction The purpose of these notes is: 1. to illustrate the main aspects of the tight-binding (TB) method; 2. to sketch how TB molecular dynamics is done in practice; 3. to show how TB supplies ideas useful for the construction of empirical potentials. In TB approaches one works within the one-electron, self-consistent field approximation (Kohn-Sham equations in the density functional theory framework). There are several kinds of TB models, differing in the kind of approximations made to the full density functional total energy expression. The common denominator is the use of a minimal basis of local orbitals. I will not consider the more accurate approaches, which definitely fall in the class of first-principle methods, and focus the attention on the more approximate approaches—which are of course those that allow molecular dynamics calculations with large number of particles. It should also be noted that TB treatments in elementary solid state books are focused on perfect crystals, and usually show how the band structure arises within this scheme when applied to situations with translational symmetry (introducing the k vector, constructing Bloch functions, etc.). I will not follow this road here, as we are mainly interested in applying this formalism to molecular dynamics calculations for systems that do not present in general any particular symmetry. We will therefore focus on a real space description, and ignore the aspects related with band theory. So, no k vector appears in these notes. These notes are just an introduction to the subject. The reader interested in going deeper may consult the references given at the end of this document [1, 2, 3, 4]. The tight-binding formalism In TB approaches one writes the (unknown) total wavefunction ψ(r) as a linear combination of atomic orbitals (LCAO): ψ(r) = X iα ciα φiα (r) (1) Here i is the atomic index and α runs on the different orbitals (such as s, px , py , pz etc). The orbital φiα (r) will be centered on the position ri of atom i. In the vast majority of cases bonding is dominated by valence electrons, so we will not include all the orbitals but only those which are believed to form the bonding by overlapping with each other and “hybridizing”. These are often the s and p orbitals of the outer shell (sp-bonded materials), but d orbitals are also essential in transition metals (sd-bonded materials). In the general case orbitals will not be an orthonormal set1 and we define the overlap integrals as Siαjβ = hiα|jβi = Z φ∗iα φjβ dr (2) In orthogonal-TB schemes S reduces to the unit matrix. Of course, the true optimal one-electron wavefunction will not necessarily be of the form (1) as atomic orbitals, being a finite number of functions, can not constitute a complete set. However, in the spirit of the variational principle, the “best” wavefunction within this class will be the one that minimizes the functional Z Z G[ψ] = ψ ∗ Hψ dr − ψ ∗ ψ dr (3) It can be shown (the reader is referred for details to section 5.3 of the Meccanica Quantistica course notes) that minimizing this functional corresponds to solving the generalized eigenvalue problem Hc = Sc (4) where c is the vector of coefficients ciα , S is the overlap matrix defined above, and H is the Hamiltonian matrix whose elements are Hiαjβ = hiα|H|jβi = Z φ∗iα Hφjβ dr (5) Explicitly, we need to solve X (Hiαjβ − Siαjβ )cjβ = 0 (6) jβ for all values of i and α. This is a standard problem that can be easily solved with linear algebra methods. The crucial feature of the tight-binding method is that orbitals are centered around atoms and decay rapidly. Therefore, the S and H matrix elements will be non-zero only for on-site terms (i = j) and between atom pairs i and j whose relative distance is less than a certain cutoff distance. The solution will consist of a set of eigenvalues s giving the energy levels, (s) and the corresponding eigenvectors ciα giving the related wavefunctions. One will then populate the energy levels with all the electrons that have to be 1 We indicate later why non-orthogonal schemes are to be preferred. 2 accommodated, starting from the lowest level and going upwards, two electrons per level, implying a contribution to the total energy X Uband = 2 s = 2 occ X ns s (7) s where ns is 0 or 12 and represents the occupancy of the state s with eigenvalue s . This energy term is called the band energy, and is usually attractive and responsible for the cohesion. In fact, if atoms get closer their overlap increases, the range of the eigenvalues increases and, since only the lowest energy states are occupied, the energy decreases (bonding increases). However, the TB formalism shown above describes only bonding due to the outer electrons. If one brings two atoms close together, inner shells will start to overlap and bring additional energy (in the form of a strong repulsion) that is not included in the band energy term. The total energy will therefore be given as E = Urep + Uband (8) where Urep is an empirical repulsive energy term. In most cases this is modelled simply as a sum of two-body repulsive potentials between atoms: 1X vR (rij ) 2 ij Urep = (9) but many-body expressions such as Urep = X X g Φ(rij ) i (10) j (where g is a non-linear embedding function) have also been proposed. Finally, let us note that the band energy term can be formally rewritten as follows: Uband = 2 X ns s = 2 X s ns hψs |H|ψs i = 2 XX s (s)∗ (s) ns ciα cjβ Hiαjβ (11) s iαjβ since |ψs i = X (s) ciα |iαi (12) iα If we define ρiαjβ = X (s)∗ (s) ns ciα cjβ (13) s then Uband = 2 X s ns s = 2 X ρiαjβ Hiαjβ = 2 Tr(ρH) (14) iαjβ (which is of course invariant with respect to rotations of the reference system) ρ is called the density matrix and it basically contains the information contained 2 At high temperatures one could also use, more accurately, a Fermi distribution. 3 in the eigenvectors. The band energy can be seen as a weighted sum of the hamiltonian matrix elements, the weights being the density matrix elements. The part of Uband including only terms with i 6= j in the summation is also called the bond energy: Ubond = 2 X ρiαjβ Hiαjβ (15) iαjβ i6=j In some schemes the terms of Uband with i = j are modelled within Urep , so in these schemes one has E = Urep + Ubond (16) The bond order The quantity ρiαjβ defined by (13) in the context of eq. (15) (that is, when i 6= j) is also called the bond order. Bond orders are important indicators of the strength of the bonding. To be non-zero, the state has to be occupied and must involve orbitals of both atoms i and j; but these are only necessary conditions. Cancellation effects may in fact reduce the bond order for states that are not effective in binding atoms together. Let use consider for instance the hydrogen molecule, with two s orbitals centered on the two atoms, φ1 and φ2 . As well known, they give rise to a bonding and an antibonding state: ψb = ψa = 1 √ (φ1 + φ2 ) 2 1 √ (φ1 − φ2 ) 2 The definition (13) reduces in this case to (b)∗ (b) (a)∗ (a) c2 ρ12 = nb c1 c2 + na c1 1 = (nb − na ) 2 (17) √ (b) (b) (a) (a) having used c1 = c2 = c1 = −c2 = 1/ 2. We see that if we place both electrons in the bonding state, nb = 1, na = 0 (we are using a convention where n = 1 means that the state is occupied by two electrons with opposite spin) and ρ12 = 1/2. If we place one electron in the bonding state and one in the antibonding state, nb = na = 1/2, then ρ12 = 0, in agreement with the fact that there is no bonding in this case. The bond order becomes negative if we place both electrons in the antibonding state. In this case, coefficients with the same (opposite) sign between two atoms indicate that the state tends to be bonding (antibonding), but this is true only for s orbitals. For instance, for a diatomic molecule pz orbitals (where z is the molecule axis) are bonding when the coefficients have opposite sign. 4 Empirical tight-binding The elements of the overlap and hamiltonian matrices are of course a function of the atomic positions. One can evaluate them by computing the integrals, but this is of course a compute-intensive approach. In empirical tight-binding such computations are not made, and there is not even the need to know the atomic orbitals: in these schemes one uses parametrized analytical expressions for the matrix elements as a function of the atomic coordinates. Let us consider the matrix element Hiαjβ = hiα|H|jβi = hiα|T |jβi + hiα|Veff |jβi (18) where the kinetic and potential terms in the Kohn-Sham hamiltonian are considered separately. It can be immediately seen that hiα|T |jβi = Z 1 φ∗iα (r) − ∇2 φjβ (r) 2 (19) depends only on the relative distance rij between the two atoms, and not on their environment: it is a two-body term (unless i = j, in that case it is an on-site term). This is in general not true for the potential term. If one makes the additional assumption that Veff can be expressed as a sum of atom-centered contributions, X (i) veff (r − ri ) (20) Veff (r) = i then it can be seen that, if i 6= j, we only get two- and three-centre integrals: (i) (j) hiα|Veff |jβi = hiα|veff |jβi + hiα|veff |jβi + X (k) hiα|veff |jβi (21) k6=i,j and if i = j we only get one- and two-centre integrals: (i) hiα|Veff |iβi = hiα|veff |iβi + X (j) hiα|veff |iβi (22) j6=i In the most commonly used approximate TB schemes, it is assumed that (20) is valid and three-centre integrals are neglected. Moreover, one assumes that one- and two-centre integrals are not calculated explicitly, but directly modelled through an explicit analytical form, which in the case of two-centre integrals is a function of the distance rij between the two atoms. Sometimes these schemes are also based on an orthonormal basis assumption (Siαjβ = δij δαβ ), so only the Hiαjβ (r) matrix elements are given. Clearly this approximation will lead to some errors. To some extent the lack of three-centre integrals can be compensated when choosing the parameters to fit one- and two-centre integrals, but this is certainly a serious approximation that may be difficult to circumvent and may lead to transferability problems. The number of required parameters seems to be rather large. For instance, in a full spd model we would have 1 + 3 + 5 = 9 orbitals on each site, and they would give rise to 9 × 9 = 81 hopping integrals between each pair of atoms. However, in the two-centre approximation symmetry greatly reduces 5 the number of independent hopping integrals, as worked out by Slater and Koster in 1954. They showed that in the generic case (different atoms) only 19 of those 81 integrals can be non-zero, 15 of them independent. The number of independent integrals goes down to 11 if the two atoms involved are identical. These symmetry considerations apply to both the bond integrals and the overlap integrals (when present). Force computation Forces on atoms, which are the main ingredient for a MD calculation, are defined as Fi = −∇i E (23) where E is the energy given by eq. (8). While the repulsive term, expressed as a classical potential, poses no problem, derivatives of the band energy term require some attention. As indicated by linear algebra methods, multiplying eq. (4) on the left by cT ∗ (where the T index means transposition) will give, using properly normalized eigenvectors, = cT ∗ Hc (24) or s = X (s)∗ (s) ciα Hiαjβ cjβ (25) iαjβ All the terms depend on the atomic positions, therefore ds = dR iαjβ X (s)∗ dc iα dR (s)∗ dHiαjβ (s) cjβ (s) Hiαjβ cjβ + ciα dR dcjβ dR (s) (s)∗ + ciα Hiαjβ (26) (here R means any component of the vector position of any atom). However, the eigenstates are orthonormal: (s0 ) X (s)∗ ciα Siαjβ cjβ = δss0 (27) iαjβ hence the derivative of this quantity has to be zero: 0= X iαjβ (s)∗ dc iα dR (s)∗ dSiαjβ (s) cjβ (s) Siαjβ cjβ + ciα dR dcjβ dR (s) (s)∗ + ciα Siαjβ (28) These two equations for the derivatives can be combined by subtracting (28) multiplied by s from (26) and obtaining X (s)∗ dHiαjβ dSiαjβ ds = ciα − s dR iαjβ dR dR The first and third terms have vanished because of (6). 6 (s) cjβ (29) We can now calculate the force: X ds X dHiαjβ dSiαjβ dE = −2 ns = −2 ρiαjβ − wiαjβ dR dR dR dR s iαjβ − (30) where ρiαjβ is the density matrix defined by (13), and wiαjβ = X (s)∗ (s) ns s ciα cjβ (31) s is an energy-weighted density matrix. Usually Hiαjβ = hαβ (rij ) and Siαjβ = sαβ (rij ) are known functions of the distance between pairs of atoms, so one can use (30) to calculate the force on each atom, provided that the vectors c are exact eigenvectors of H. The actual force calculations will of course have to deal with the angular dependences of the hopping and overlap integrals. There are efficient library routines available that cope with this problem, so that it is not of practical concern. To summarize, this is how a TB MD calculation works in practice for each time step: 1. obtain the bond integrals and, when present in the model, the overlap integrals as parametrized analytical functions of the interatomic distances; 2. solve the eigenvalues problem to obtain the density matrix ρ and the energy-weighted density matrix w (this is the compute-intensive part); 3. compute the forces using (30); 4. move the atoms to the new positions; 5. go to step 1 and repeat. The transferability problem As always the case with approximate interaction schemes, transferability can be a major problem with TB. In other words, a parametrization appropriate for a certain environment, such as for instance bulk atoms in a semiconductor with a diamond-like structure, may not be appropriate for a different environment where coordination is different. This issue is particularly important for TBMD calculations, where atomic environments can change in unpredictable ways. The transferability problem is less strong when the orbitals are more localized, because there is less overlap with neighboring atoms and therefore the parameters are less dependent on the environment. Orbitals of isolated atoms are not really good for TB, because they have a fairly long exponential tail. Generally, the best transferability is achieved with the most localized basis sets. In this sense, non-orthogonal parametrizations (that is, where S is not the unit matrix), even if computationally heavier, are preferable. The reason for this is that an orthogonal basis set that is equivalent to a non-orthogonal one 7 (in the sense that it reproduces the same eigenvalues for a given structure) must necessarily have orbital wavefunctions with a longer range. Such wavefunctions usually have oscillatory tails, which make it possible to have all overlaps equal to zero. However this feature “binds” the model to a certain crystal structure: orthogonality can not be easily achieved by the same model in the context of a different environment. As a consequence, transferability will be lost. For this reason, the tendency of recent parametrizations is to use nonorthogonal schemes (and this is why I presented the slightly more complex non-orthogonal theory in these notes). O(N) Methods Diagonalizing a N × N matrix is usually a O(N 3 ) task; N would be in this case the number of valence electrons involved. Therefore, this is the most computeintensive part of a standard empirical TB calculation. Recently, approaches that scale linearly with N have been proposed and used. These approaches are based on the use of the density matrix, and assume that this matrix is short-ranged, namely that only matrix elements between atoms within a certain cutoff radius are nonzero. The physical justification behind this assumption is electronic screening. Thanks to screening, a perturbation made in a point of the system is not seen far from that point. The reader can consult [1] for more detail. Tight-binding and classical potentials The tight-binding formalism allows to establish a “bridge” between the firstprinciples interactions as obtained from electronic structure methods, and empirical classical potentials (where the electronic degrees of freedom have been formally eliminated). This connection is useful to get hints about the analytical forms best suited to model materials realistically. Local density of states We first need to define some concepts, putting emphasis on locality as our classical potentials must be able to change as the local environment changes—this is the key to accurate modelling and good transferability. For sake of simplicity, we will assume in this section that the basis of orbitals is orthonormal. This is not a fundamental assumption; it is done just to simplify the treatment, but the final conclusions remain valid in the general case. Let us start by rewriting (11) introducing the Fermi distribution n() and a density of states X D() = δ( − s ) (32) s so that Uband = 2 X Z +∞ ns s = 2 s 8 n()D() d −∞ (33) In turn, we can consider D() to be the sum of local density of states, one for each orbital in the system: D() = X Diα () (34) |hψs |iαi|2 δ( − s ) (35) iα This equation is satisfied if we choose Diα = X s as can be readily verified by summing on iα making use of iα |iαihiα| = 1. The local density of states is an important quantity in the context of atomic bonding. When atoms are brought together to form a condensed system, their energy levels become “bands” and we can consider the band seen by each atom as described by the local density of states. We can now say that P Uband = 2 X Z +∞ −∞ iα n()Diα () d (36) and this is useful because when using classical potentials the total energy is P almost always written as a sum over atoms, E = i Ei . Therefore, the relation above is our key to establish the bridge between the electronic and classical worlds. The moments Diα () can be a rather complex function, but we can attempt a simplified description in terms of its moments, as proposed by the group of Cyrot-Lackmann. The p-th moment of the distribution is defined as µiα p = Z +∞ −∞ p Diα () d (37) that we can also write, using (35), µiα p = X |hψs |iαi|2 ps (38) hψs |iαihiα|H p |ψs i = hiα|H p |iαi (39) s This can be expanded as µiα p = X s This expression is particularly nice, because the orbitals are fixed and known, so this is a real space function of the positions of the atoms i and j that can be evaluated without solving any Schrœdinger equation. iα Note that µiα 0 = 1, and µ1 represents an average energy for the atomic orbital, related with the choice of the zero for the energy scale. The really relevant informations on the shape of Diα () are therefore contained in µiα 2 and higher moments. Let us then concentrate on the second moment, which is the most important one. If we assume that the average energy is zero, the second 9 moment is a variance and therefore represents the square of the energy width, or “bandwidth” W of the local density. For instance, if one assumes that Diα () is a rectangular distribution centered on 0, of width W and height 1/W , then the 2 second moment is W 2 /12. More generally, we can always consider µiα 2 ∝W . We can write the second moment as 2 µiα 2 = hiα|H |iαi = X hiα|H|jβihjβ|H|iαi (40) jβ This can be seen as the sum of all the possible two-hops paths that start from iα and come back to iα. It is easy to see by the same argument that µiα p can be written as the sum of all the possible p-hops paths that start from iα and come back to iα. So we have µiα 3 = X hiα|H|jβihjβ|H|kγihkγ|H|iαi (41) jβkγ and so on. This result is known as the moment theorem. The very remarkable result is that all moments can be constructed out of the same building blocks that are usually just a function of the distance between pairs of atoms. It is precisely this result that allows us to connect the tight binding theory with empirical potentials. Glue and embedded-atom potentials [For background information on these potentials please see ref. [5]]. Let us assume that we can neglect the third and higher order moments and can describe a system just in terms of the second moments. Also, let us drop the orbital indexes in this section of sake of simplicity, and also because we are now moving to a very approximate, almost qualitative approach. Consider eq. (36) for the tight-binding energy. Each term in the summation will then depend only on µi2 , as this is by hypothesis the only quantity that characterizes Di (). But µi2 is a quantity that has the dimensions of an energy squared, so it must be necessarily Uband ' −A Xq µi2 (42) i where A is a positive proportionality constant. This amounts to say that the energy decrease is proportional to the bandwidth, which is understandable. Think for instance to a situation where the band is half-populated: the average energy will decrease proportionally to W when W is increased. The moment theorem (40) tells us that µi2 = X [h(rij )]2 (43) j where we have assumed that h(rij ) = |hi|H|ji| is a function of the distance between i and j. Putting together the two results we have Uband ' −A X sX i 10 j [h(rij )]2 (44) P P which is a glue potential term i U [ j ρ(rij )] with a glue function U (n) = √ −A n and an “atomic density” function ρ(r) = [h(rij )]2 . n is here the “generalized coordination” of the glue scheme. The tight-binding approach suggests that the energy should depend on the square root of the coordination (this may be contrasted with the linear dependence upon coordination implied by two-body interaction models). In practice, in real parametrizations the functions U (n) and ρ(r), together with the pair potential that enters the Urep part, are fitted to experimental and simulation data. In this sense, the approach presented in this subsection does not really have a quantitative validity. Its importance lies in the fact that it shows how this kind of Hamiltonians arises “naturally” out of the tight-binding framework, with a glue function U exhibiting a positive curvature. Such potentials are also called pair functionals, as they not pairwise potentials but still constructed using only the distance between pairs of atoms. They work fairly well for certain classes of metals such as noble metals, where angular forces between atoms are weak. However, for transition models and covalent materials such as semiconductors, higher order moments play a crucial role and cannot be neglected. In particular, it has been found that second moment schemes cannot reproduce accurately energy differences between crystal structures. Pair functionals from higher order moments If higher order moments have to be considered for a realistic modelling, we move from pair functionals to cluster functionals. If we limit ourselves to fourth moments, our theory suggests that the functional form for the energy of the system will be of the type Uband = X V (µi2 , µi3 , µi4 ) (45) i This can be computed once one has an appropriate parametrization of the local density of states in terms of these three moments. The moment theorem tells us that µi3 consists of three-body terms, and µi4 consists of four-body terms. The moments will be combined to give an energy by means of a non-linear expression. Several models of this kind have been studied in the past years. Unfortunately, it was found that even fourth moment descriptions are not able to explain subtle energy differences even in relatively simple sp-bonded materials such as semiconductors. This has led researchers to explore alternative models. Bond order potentials In covalent materials there is an accumulation of electronic charge in the regions between pair of atoms. In such cases, it may be more convenient to focus on the concept of bond rather than on the concept of atom (as implied by (45)) when writing the analytical form for the classical potential. 11 To this end, let us go back to the bond order concept and consider in particular eq. (14), which we rewrite again dropping the orbital indexes: Uband = 2 X ρij Hij (46) ij ρij is what we called the bond order, and is a quantity that depends on the environment. Therefore, it can not be modeled as a function of the distance between i and j, and it will be influenced by the nature and position of all the atoms in the surrounding of the i-j bond. In this sense we are still dealing with a cluster functional, but of a slightly different nature with respect to those described above. To model ρij one typically considers bonding and antibonding orbitals, that—as discussed for the H2 molecule case—bring contributions of different signs to the bond order. The details of the models and fitting procedures can be rather complex and will not be discussed here. One of the most successful empirical potentials for semiconductors based on the bond order concept is that proposed by Tersoff [6] for silicon, germanium and carbon. The Tersoff scheme is based on an analytical form E= 1X [ΦR (rij ) + ρ(zij )ΦA (rij )] 2 i6=j (47) where ΦR (r) is a repulsive pair potential, ΦA (r) an attractive pair potential, and ρ the bond order modelled as a decreasing function of a “bond coordination” variable zij , in turn defined in terms of three-body terms dependent on distances and angles. The choices made by Tersoff in his original implementation are zij = ΦR (r) = Ae−λ1 r (48) ΦA (r) = −Be−λ2 r (49) ρ(z) = (1 + β n z n )−1/2n (50) X fc (rik )g(θijk )eλ 3 (r −r )3 ij ik (51) k6=i,j fc (r) = 1 1 2 1 2 − sin π r−R 2 D if r < R − D 0 g(θ) = 1 + if R − D < r < R + D if r > R + D c2 c2 − d2 d2 + (h − cos θ)2 (52) (53) where A, B, λ1 , λ2 , β, n, λ, R, D, c, d, h are 12 fitting parameters. These potentials are qualitatively correct, and definitely better than potentials from the previous generation of purely “geometrical” potentials with three-body terms (Stillinger-Weber), in particular from the transferability point of view. Some problems remain, which may be related to the parametrization and fitting rather than to the general scheme. 12 References [1] L. Colombo, Tight-binding molecular dynamics: A primer, Rivista del Nuovo Cimento 10, 2005 (2006). [2] A. E. Carlsson, Solid State Phys. 43, 1 (1990). [3] M. Finnis, Interatomic forces in condensed matter, Oxford University Press, Oxford (2003). [4] K. Ohno, K. Esfarjani and Y. Kawazoe, Computational Materials Science, Springer-Verlag, Berlin (1999) (chapter 3). [5] F. Ercolessi, A molecular dynamics primer, http://www.fisica.uniud. it/~ercolessi/md/, and references therein. [6] J. Tersoff, Phys. Rev. B 37, 6991 (1988). 13