Finding the ground state

Finding the electronic ground state is the heart of any DFT code. In CONQUEST, we need to consider several linked stages: the density matrix (found using diagonalisation or linear scaling); self-consistency between charge and potential; and the support functions (though these are not always optimised).

The basis functions in CONQUEST are support functions (localised functions centred on the atoms), written as $\phi_{i\alpha}(\textbf{r})$ where $i$ indexes an atom and $\alpha$ a support function on the atom. The support functions are used as basis functions for the density matrix and the Kohn-Sham eigenstates:

\[\begin{split}\psi_{n\mathbf{k}}(\mathbf{r}) = \sum_{i\alpha} c^{n\mathbf{k}}_{i\alpha} \phi_{i\alpha}(\mathbf{r})\\ \rho(\mathbf{r}, \mathbf{r}^\prime) = \sum_{i\alpha j\beta} \phi_{i\alpha}(\mathbf{r}) K_{i\alpha, j\beta} \phi_{j\beta}(\mathbf{r}^\prime)\end{split}\]

where $n$ is an eigenstate index and $\mathbf{k}$ is a point in the Brillouin zone (see here for more on this). The total energy can be written in terms of the density matrix, as:

\[E_{KS} = \mathrm{Tr}[HK] + \Delta E_{Har} + \Delta E_{XC}\]

for the Hamiltonian matrix $H$ in the basis of support functions, with the last two terms the standard Harris-Foulkes [G1, G2] correction terms.

For diagonalisation, the density matrix is made from the coefficients of the Kohn-Sham eigenstates, $c^{n\mathbf{k}}_{i\alpha}$, while for linear scaling it is found directly during the variational optimisation of the energy.

The question of whether to find the density matrix via diagonalisation or linear scaling is a complex one, depending on the system size, the accuracy required and the computational resources available. The simplest approach is to test diagonalisation before linear scaling.

Diagonalisation

Exact diagonalisation in CONQUEST uses the ScaLAPACK library which scales reasonably well in parallel, but becomes less efficient with large numbers of processes. The computational time will scale as $N^3$ with the number of atoms $N$, but will probably be more efficient than linear scaling for systems up to a few thousand atoms. (Going beyond a thousand atoms with diagonalisation is likely to require the multi-site support function technique.)

To choose diagonalisation, the following flag should be set:

DM.SolutionMethod diagon

It is also essential to test relevant parameters, as described below: the k-point grid in reciprocal space (to sample the Brillouin zone efficiently); the occupation smearing approach; and the parallelisation of k-points.

Go to top

Brillouin zone sampling

We need to specify a set of discrete points in reciprocal space to approximate integrals over the Brillouin zone. The simplest approach is to use the Monkhorst-Pack approach [G3], where a grid of points is specified in all directions:

Diag.MPMesh T
Diag.MPMeshX 2
Diag.MPMeshY 2
Diag.MPMeshZ 2

This grid can be forced to be centred on the gamma point (often an important point) using the parameter Diag.GammaCentred T. The origin of the Monkhorst-Pack grid may also be offset by an arbitrary vector from the origin of the Brillouin zone, by specifying:

Diag.MPShiftX 0.0
Diag.MPShiftY 0.0
Diag.MPShiftZ 0.0

For the Monkhorst-Pack approach, instead of specifying the number of points in x, y and z explicitly, they can be set automatically by giving a spacing in reciprocal space: Diag.dk where units are inverse Bohr radii. The number of points will be chosen so that $2\pi/(a\times dk)$ is less than the value specified.

Alternatively, the points in reciprocal space can be specified explicitly by giving a number of points and their locations and weights:

Diag.NumKpts 1

%block Diag.Kpoints
0.00 0.00 0.00 1.00
%endblock Diag.Kpoints

where there must be as many lines in the block as there are k-points. It is important to note that CONQUEST does not consider space group symmetry when integrating over the Brillouin zone.

Go to top.

K-point parallelization

It is possible to parallelise over k-points: to split the processes into sub-groups, each of which is responsible for a sub-set of the k-points. This can be very efficient, and is specified by the parameter Diag.KProcGroups N, where it is important that the number of processes is an integer multiple of the number of groups N. It will be most efficient when the number of k-points is an integer multiple of the number of groups.

Go to top.

Electronic occupation smearing

The occupation numbers of the eigenstates are slightly smeared near the Fermi level, following common practice. The default smearing type is Fermi-Dirac smearing with a temperature (in Hartrees) set with the flag Diag.kT which defaults to 0.001Ha.

The Methfessel-Paxton approach [G4] to occupations allows much higher smearing temperatures with minimal effect on the free energy (and hence accuracy) of the energy. This generally gives a similar accuracy with fewer k-points, and is selected as:

Diag.SmearingType 1
Diag.MPOrder 0

where Diag.MPOrder specifies the order of the Methfessel-Paxton expansion. It is recommended to start with the lowest order and increase gradually, testing the effects.

Go to top.

Padding Hamiltonian matrix by setting block size

With the default setting, the size of Hamiltonian and overlap matrices is determined by the total number of support functions. It can be a prime number and timing of diagonalisation can be very slow in such cases, since the division of the matrix into small pieces is difficult.

By padding, we can change the size of Hamiltonian matrix to improve the efficiency of the diagonalisation. To set an appropriate value for the block size of the matrix, specify the following two variables.

Diag.BlockSizeR       20
Diag.BlockSizeC       20

Note that these two numbers should be the same when padding (and when using ELPA as described below). We suggest that an appropriate value is between 20 and 200, but this should be tested.

The option for padding was introduced after v1.2, and if you would like to remove it, set the following variable.

Diag.PaddingHmatrix              F

Go to top

Using ELPA

ELPA is an alternative to ScaLAPACK for diagonalisation which can show better parallelisation, and allows use of GPU cards for acceleration (though the GPU implementation in CONQUEST is at a very early stage). To enable ELPA and GPUs, set the following tags:

Diag.UseELPA T
Diag.ELPA_GPU T

Some care is needed in the balance between MPI processes and number of GPUs on the compute nodes: careful testing is important.

Go to top.

Linear Scaling

A linear scaling calculation is selected by setting DM.SolutionMethod ordern. There are two essential parameters that must be set: the range of the density matrix, and the tolerance on the optimisation.

DM.L_range 16.0
minE.Ltolerance 1.0e-6

The tolerance is applied to the residual (the RMS value of the gradient of the energy with respect to the density matrix). The maximum number of iterations in the density matrix optimisation can be set with DM.LVariations (default 50).

At present, CONQUEST can only operate efficiently in linear scaling mode with a restricted number of support functions (though this is an area of active development). PAO basis sets of SZ and SZP size (minimal and small in the ion file generator) will run without restrictions. For larger PAO basis sets, the OSSF approach must be used, and is effective. With a blip basis there are no restrictions, though efficient optimisation is still under active development.

It is almost always more efficient to update the charge density while optimising the density matrix, avoiding the need for a separate self-consistency loop. This is set by choosing minE.MixedLSelfConsistent T.

An essential part of a linear scaling calculation is finding the approximate, sparse inverse of the overlap matrix. Normally this will happen automatically, but it may require some tests. The key parameters are the range for the inverse (see the Atomic Specification block, and specifically the Atomic Specification block) and the tolerance applied to the inversion.

Atom.InvSRange R
DM.InvSTolerance R

A tolerance of up to 0.2 can give convergence without significantly affecting the accuracy. The range should be similar to the radius of the support functions, though increasing it by one or two bohr can improve the inversion in most cases.

The input tags are mainly found in the Density Matrix section of the Input tags page.

Go to top.

Self-consistency

The normal mode of operation for CONQUEST involves an iterative search for self-consistency between the potential and the charge density. However, it is also possible to run in a non-self-consistent manner, either with a converged charge density for electronic structure analysis, or for dynamics, which will be considerably more efficient than a self-consistent calculation, but less accurate.

Self consistency is set via the following parameters:

minE.SelfConsistent T
minE.SCTolerance    1E-7
SC.MaxIters         50

The tolerance is applied to the RMS value of the residual, $R(\mathbf{r}) = \rho^{out}(\mathbf{r}) - \rho^{in}(\mathbf{r})$, integrated over all space:

\[R_{RMS} = \sqrt{\Omega \sum_l \left(R(\mathbf{r}_l)\right)^2 }\]

where $\mathbf{r}_l$ is a grid point and $\Omega$ is the grid point volume (integrals are performed on a grid explained in Integration Grid). The maximum number of self-consistency cycles is set with SC.MaxIters, defaulting to 50.

For non-self-consistent calculations, the main flag should be set as minE.SelfConsistent F. The charge density at each step will either be read from a file (if the flag General.LoadRho T is set), or constructed from a superposition of atomic densities. The Harris-Foulkes functional will be used to find the energy.

Go to top.

Restarting SCF

The SCF cycle can be restarted from a previous density matrix or charge density, which may significantly speed up convergence. The density matrix is automatically written out in the files Kmatrix2.* or Lmatrix2.* (depending on whether diagonalisation or linear scaling is being used). These files are read in, and the initial charge density made from them by setting the flags:

General.LoadDM T
SC.MakeInitialChargeFromK T

The charge density is not written out by default; this can be changed by setting IO.DumpChargeDensity T which results in the files chden.nnn being created. To read these in as the initial charge density, the flag General.LoadRho T should be set.

Go to top.

Advanced options

Instabilities during self-consistency are a well-known issue in electronic structure calculations. CONQUEST performs charge mixing using the Pulay approach, where the new charge density is prepared by combining the charge densities from a number of previous iterations. In general, we write:

\[\rho_{n+1}^{in} = \sum_{i} \alpha_i \left[ \rho_{i}^{in} + A R_{i} \right]\]

where $R_{i}$ is the residual at iteration $i$, defined above. The fraction of the output charge density that is included is governed by the variable $A$, which is set by the parameter SC.LinearMixingFactor (default 0.5). If there is instability during the self consistency, reducing $A$ can help (though will likely make convergence a little slower).

It is also advisable to apply Kerker preconditioning to the residual when the system is large in any dimension. This removes long wavelength components of the residual, reducing charge sloshing. This is controlled with the following parameters:

SC.KerkerPreCondition T
SC.KerkerFactor       0.1

where the Kerker factor gives the wavevector at which preconditioning starts to reduce. The Kerker preconditioning is applied to the Fourier transform of the residual, $\tilde{R}$ as:

\[\tilde{R} \frac{q^2}{q^2 + q^2_0}\]

where $q^2_0$ is the square of the Kerker factor and $q$ is a wavevector. You should test values of $q_0$ around $\pi/a$ where $a$ is the longest dimension of the simulation cell (or some important length scale in your system).

Go to top.

Hubbard U and XC Functionals

CONQUEST now supports the DFT+U formalism, as described by [G5]. The relevant input tags are:

DM.DFTplusU T
DM.WriteOccMat T (default)

# Species, n, l, zeta, U (Ha)
%block DFTplusU
1 3 2 1 0.0772
%endblock

The DFTplusU block specifies the projectors onto the correlated subspace (at present these must be PAOs) and requires one line per species. The block specifies n, l, zeta and the value of U (in Hartrees). We strongly recommend that PAOs created with equal energies are used (in the PAO generation, add Atom.Cutoffs energy to the atom block), and that the largest radius zeta function (i.e. zeta=1) is used. Careful testing is needed for the projectors and an appropriate value of U (we note that when using the same pseudopotentials, CONQUEST and QuantumEspresso calculations agree very well).

The exchange-correlation (XC) functional is (almost always) set by the ion files used to specify the elements (and generated by the PAO generation tool). It is possible to run a calculation using a different XC functional to the ion files by setting General.DifferentFunctional T and General.FunctionalType <integer> which is documented in the General section of the input tags documentation.

Support functions

Support functions in CONQUEST represent the density matrix, and can be simple (pseudo-atomic orbitals, or PAOs) or compound, made from simple functions (either PAOs or blips). If they are compound, made from other functions, then the search for the ground state involves the construction of this representation. Full details of how the support functions are built and represented can be found in the manual section on basis sets.

Go to top

Charged systems

CONQUEST uses periodic boundary conditions, which require overall charge neutrality. However, charged systems can be modelled: if an excess of electrons is specified by the user, a uniform positive background charge is added automatically to restore overall neutrality. At present, there are no correction schemes implemented, so it is important to test the convergence of the energy with unit cell size and shape. Electrons are added by setting the parameter General.NetCharge.

General.NetCharge 1.0

This gives the number of extra electrons to be added to the unit cell, beyond the valence electrons.

Go to top.

Surface dipole correction

If your simulation involves a slab calculation, then unless the slab is perfectly symmetrical there can be a dipole moment which gives an unphysical field across the periodic cell even for a neutral system. A correction can be applied, following the method outlined in [G6, G7] (where we use the energy from the Bengtsson paper, which is correct). This method only works for slab calculations, and calculates a dipole correction potential, placing the necessary discontinuity in the vacuum. The user can specify the location of the discontinuity (in fractional coordinates), or the code will place it at the point where the planar averaged density is a minimum (which is a good way to find the centre of the vacuum).

SC.SurfaceDipoleCorrection T/F
SC.SurfaceNormal x, y, or z
SC.DiscontinuityLocation (*real*)

It is also possible to write out the planar-averaged potential and charge density (averaged in the plane perpendicular to the surface normal) using the parameter SC.OutputAveragePotential T/F. This generates the file AveragedPotential.dat.

Go to top.

Spin polarisation

CONQUEST performs collinear spin calculations only. A spin-polarised calculation is performed by setting the parameter Spin.SpinPolarised to T.

Users need to specify either the total initial number of spin-up and spin-down electrons in the simulation cell (using the parameters Spin.NeUP and Spin.NeDN), or the difference between the number of spin-up and spin-down electrons (using the parameter Spin.Magn).

The number of electrons for each spin channel can be fixed during SCF calculations by setting the parameter Spin.FixSpin to T (default is F).

It is possible to specify the spin occupation in the atomic charge densities (i.e. the number of spin-up and spin-down electrons used to build the density). This is done in the Atomic Specification part of the Conquest_input file. Within the atom block for each species, the numbers of electrons should be set with Atom.SpinNeUp and Atom.SpinNeDn. Note that these numbers must sum to the number of valence electrons for the atom.

Go to top.

Examples: FM and AFM iron

A two atom ferromagnetic iron simulation might be set up using the parameters below. Note that the net spin here is S=1 $\mu_B$ (i.e. two more electrons in the up channel than in the down), and that the net spin is not constrained.

# example of ferro bcc Fe
Spin.SpinPolarised T
Spin.FixSpin  F
Spin.NeUP  9.0     # initial numbers of up- and down-spin electrons,
Spin.NeDN  7.0     # which will be optimised by a SCF calculation when Spin.FixSpin=F

%block ChemicalSpeciesLabel
1   55.845   Fe
%endblock ChemicalSpeciesLabel

An equivalent anti-ferromagnetic calculation could be set up as follows (though note that the initial specification of spin for the atoms does not guarantee convergence to an AFM ground state). By defining two species we can create spin-up and spin-down atoms (note that both species will require their own, appropriately labelled, ion file).

# example of anti-ferro bcc Fe
Spin.SpinPolarised T
Spin.FixSpin  F
Spin.NeUP  8.0     # initial numbers of up- and down-spin electrons in an unit cell
Spin.NeDN  8.0     # are set to be the same

%block ChemicalSpeciesLabel
1   55.845   Fe1
2   55.845   Fe2
%endblock ChemicalSpeciesLabel

%block Fe1           # up-spin Fe
Atom.SpinNeUp 5.00
Atom.SpinNeDn 3.00
%endblock Fe1
%block Fe2           # down-spin Fe
Atom.SpinNeUp 3.00
Atom.SpinNeDn 5.00
%endblock Fe2

When using multi-site or on-site support functions in spin-polarised calculations, the support functions can be made spin-dependent (different coefficients for each spin channel) or not by setting Basis.SpinDependentSF (T/F, default is T).

Go to top.

[G1]

J. Harris. Simplified method for calculating the energy of weakly interacting fragments. Phys. Rev. B, 31:1770, 1985. doi:10.1103/PhysRevB.31.1770.

[G2]

W. Foulkes and R. Haydock. Tight-binding models and density-functional theory. Phys. Rev. B, 39:12520, 1989. doi:10.1103/PhysRevB.39.12520.

[G3]

H. J. Monkhorst and J. D. Pack. Special points for brillouin-zone integrations. Phys. Rev. B, 13:5188, 1976. doi:10.1103/PhysRevB.13.5188.

[G4]

M. Methfessel and A. T. Paxton. High-precision sampling for brillouin-zone integration in metals. Phys. Rev. B, 40:3616–3621, Aug 1989. URL: https://link.aps.org/doi/10.1103/PhysRevB.40.3616, doi:10.1103/PhysRevB.40.3616.

[G5]

Matteo Cococcioni and Stefano de Gironcoli. Linear response approach to the calculation of the effective interaction parameters in the $\mathrm LDA+\mathrm U$ method. Phys. Rev. B, 71:035105, Jan 2005. URL: https://link.aps.org/doi/10.1103/PhysRevB.71.035105, doi:10.1103/PhysRevB.71.035105.

[G6]

Jörg Neugebauer and Matthias Scheffler. Adsorbate-substrate and adsorbate-adsorbate interactions of na and k adlayers on al(111). Phys. Rev. B, 46:16067–16080, 1992. doi:10.1103/PhysRevB.46.16067.

[G7]

Lennart Bengtsson. Dipole correction for surface supercell calculations. Phys. Rev. B, 59:12301–12304, 1999. doi:10.1103/PhysRevB.59.12301.

Go to top.