HPC/Applications/cp2k

From CNM Wiki
Jump to navigation Jump to search

Note: This document is based on a .doc file of unknown provenance. I converted it and made minor updates to links. ---stern.

The former main CP2K site http://cp2k.berlios.de/ is now defunct. Content has not yet been migrated to the new site http://www.cp2k.org/ .

CP2K User Self Support

The following pages are intended to be a starting point for people who want to use CP2K in their research. They are by far not complete and by no means a replacement for reading the related literature, running tests etc., but an attempt to get users started at all and explain commonly used features, quirks and problems of the CP2K code. Part of it is based on the doc/tutorialCp2k.html file from the CP2K cvs distribution. All CP2K users are encouraged to contribute to these pages by making corrections and/or adding new text to make it as useful as possible.

Further information:

Input and Output

Input Reference Manual

The input reference documentation (i.e. what keywords exist, what flags they accept and a short description of the purpose) of cp2k is embedded into the executable. The version matching the current cvs main trunk code can be found on the website http://cp2k.berlios.de/manual/ . Alternatively, you can generate it from your cp2k executable by using the command line flags --html-manual or --xml ( generate_manual_howto). The documentation generated by the executable should be consistent with its version, whereas the one on the website (usually) refers to the latest version.

General Input Syntax

  • The input is divided into a tree of sections and subsections that start with &<KEYWORD> and end with &END.
  • Each section can contain plain keywords (followed by value(s)) or further (sub-)sections.
  • Both the section names and the keyword names are case insensitive.
  • The special characters "#" and "!" denote the beginning of a comment that goes until the end of the line.
  • If the value of a keyword is expressed in units, the unit can be changed (where supported) by preceding the value with the unit in square brackets. In the square brackets there must be no spaces. Example: "[m]" switches lengths to meters.
  • A long line of input can be split using the "\" character at the end of the line
  • Keywords with a logical value (flags), often default to 'false' when the keyword is missing and to true if the value is omitted; providing an explicit value after the keyword as is recommended (for example KEYWORD T).
  • Sections can have a default parameter that comes right after the section value. For example the OT section has a logical default parameter that controls the activation of the OT method. So, simply writing
 &OT
 ...
 &END OT
- even without any keywords in the section - activates the OT method and
 &OT FALSE
 ...
 &END OT
or the absence of the OT section deactivates it.
  • Whitespace is (mostly) ignored, so using indentation in the input file helps making sure that the structure is correct. If you are an (x-)emacs user, you can try the cp2k emacs mode from http://cp2k.googlegroups.com/web/cp2k-mode.el to assist in editing. This mode recognized known cp2k keywords for syntax highlighting and indentation.

Basic Input File Structure

On the topmost level a minimal cp2k input file must have a &GLOBAL section which allows to set basic and global job parameters (e.g. project name and run type) and usually also the &FORCE_EVAL section is provided to select the method by which forces and/or energies are computed. Depending on the run type you may need addional sections, most frequently a &MOTION section.


The most important flag in the &FORCE_EVAL section is the value you provide to the METHOD keyword, as this selects how energies and forces a evaluated (ab initio DFT, empirical force-field, QM/MM and so on). This way you can easily switch from one method to the another (provided there are meaningful parameters to both methods in the input). If you want to use the ab-initio DFT part of cp2k then the &DFT subsection and its &QS subsection will probabily be of interest for you, in the case of a force-field calculation you may want to have a closer look at the &MM subsection. The &SUBSYS subsection allows you to define your (sub-)system (=atoms, elements, positions, supercell and so on). Specifically coordinates and molecular mechanics topologies can be provided "inline" or read in from external files.

Whenever atoms move during a calculation (e.g. geometry optimization or molecular dynamics), you can set the corresponding parameters in the &MOTION section.

More details on frequently used keywords will follow in the description of individual examples. In general, one has to keep in mind that the structure of the input file tries to mimic the structure of the code (so it may happen, that seemingly related flags are defined in different sections). Also, when discussing individual flags, it is recommended to name a flags by its whole "trace", e.g. use %FORCE_EVAL%SUBSYS%COORD to refer to the section where you provide coordinates explicitely. This naming conventions is also used by the input reference manual, so if you search it for functionality, you can see immediately where the keyword belongs in the input "tree".

Output Generation and "Print Keys"

Output handling in cp2k is, like many other things, very flexible and modular, and thus for the beginner occasionally a bit confusing. One generally controls the amount and frequency and destination of output generated from &PRINT subsections in input file sections that control a certain functionality. Those &PRINT subsections are generally referred to as print keys. Whether a print key is "triggered" or not is primarily and globally controlled by the %GLOBAL%PRINT_LEVEL keyword. Possible values are:

SILENT
Almost no output
LOW
Little output
MEDIUM
Quite some output
HIGH
Lots of output
DEBUG
Everything is written out, useful for debugging purposes only

with MEDIUM being the default (this is quite verbose, but a good choice when starting a new project). For output that is written to a file, that file name is usually derived from the value of the %GLOBAL&PROJECT_NAME keyword.


A more fine grained control about what is printed where can be obtained via modifying print keys explicitly and changing the contained parameters.To understand how to control the output, it is necessary to first understand the iteration level. During the calculations cp2k keeps a global iteration level count, which is a set of integers that keep track how many iterations of what program module one has done so far. The exact meaning of each iteration level depends on the run type, for example doing an MD with the QS GPW method an iteration level count of "1 4 5" means that the program is doing the 5th SCF step of the 4th MD step, whereas "1 9" means the we are at the 9th md step (outside the scf loop). The first 1 is a root iteration level that stays always 1 during a normal MD run.

Section parameter (= argument to &SECTION)

Level starting at which this property is printed, valid values are the available print levels SILENT, LOW, MEDIUM, HIGH, DEBUG and additionally ON and OFF. Not providing a &PRINT section means to take its default value (predefined based on the general usefulness of the corresponding output). The presence of a &PRINT section implies a value of ON (= always printed). Use OFF to selectively turn a printkey off.

EACH
Controls how often a proprety is printed. This is matched with the actual iteration level from the right, replacing non present levels with 1. For example "2 1" for an SCF property during an SCF means to write each SCF step every 2 MD steps. How to handle the last iteration is treated separately in ADD_LAST (this means that even with setting EACH 0 there may still be output for a print key).
ADD_LAST
Determines if the last iteration should be added to the output, and if it should be marked symbolically (with l) or with the iteration number. Not every process is able to identify the last iteration early enough to be able to output. Valid keywords: NO, NUMERIC, SYMBOLIC
COMMON_ITERATION_LEVELS
How many iterations levels should be written in the same file (no extra information about the actual iteration level is written to the file). If the value is 0 then all iterations go to different files, if it is 1 then (if the current property is an SCF property and one is doing an MD) then each MD step goes to a separate file (but different scf steps are in the same file)
FILENAME
Controls what filename is used for output.
  • Use __STD_OUT__ (exactly as written here) for the screen or a standard logger file.
  • Use filename to get "projectname-filename[-middlename]-iteration.extension"
  • Use ./filename (or any path that contains at least one "/") to get "./filename[-middlename]-iteration.extension"
  • Use =filename to get exactly filename as output file (please note that this can be dangerous because different output/iterations might write on the same file

Restarting Calculations

For large or long calculations one needs a restart mechanism to continue a calculation from where it was left of and sometimes also change a few parameters (e.g. continue an NVE trajectory in NVT ensemble). Regular cp2k restart files are just (normal) input files with all the defaults made explicit. So it should not be too difficult to understand them. However editing those restart file is not so practical, and cannot be prepared before the previous run has finished. To make this restarting easier and more flexible one can tell cp2k to read (additional) input from an external file and which parts of that file should override the current input. This is done with the section &EXT_RESTART giving the name of the external input file with the %EXT_RESTART%RESTART_FILE_NAME [1] keyword. In this section there are several keywords control exactly what is taken from the external file, for example if one wants to restart the positions and not the velocities, one needs to set RESTART_VEL F and RESTART_POS T. By default all (supported) parameters read in from the restart file override the current input.

In DFT calculation there is a second restart file containing the wavefunctions. This restart is stored in a binary file that is architecture specific (see cp2k/tools/RestartTools if you need to transfer it). The name. location and frequency of this file are controlled by the %FORCE_EVAL%DFT%SCF%PRINT%RESTART print key. Reading in the wavefunction restart is done by setting the file name with %FORCE_EVAL%DFT%WFN_RESTART_FILE_NAME and then setting the value of the keyword %FORCE_EVAL%DFT%SCF%SCF_GUESS to RESTART (default is an atomic guess).


Parameter choices

GPW

The Gaussian Plane Waves method Lippert1997 to efficiently solve the DFT Kohn-Sham equations. It uses gaussians as basisset, and planewaves as auxiliary basis. This is similar at the Resolution of Identity (RI) methods but with a different basisset.

In GPW the whole density is transferred to plane waves, and one has the density n(r)=sumi jPi j phii(r)phij(r) in the gaussian basis set and the density ñ taken on a uniform grid.

Then ñ is used to caluclate the hartree potential VH (via an FFT based poisson solver) and the exchange and correlation potential Vxc. These potential are then transferred back to gaussian basis set by integrating them with phii(r)phij(r). To make the collocation and integration perfectly consistent with each other one can set FORCE_EVAL%DFT%QS%MAP_CONSISTENT this normally adds only a very small overhead to the calculation and is

Cutoff

n and ñ are not equal, and this introduces an error in the calculation. ñ converges toward n when the cutoff (that controls the grid spacing) goes to infinity (and gridspacing to 0). Which cutoff is sufficient to represent a density depends on how sharp is the gaussian basis set (or that of the potential, but it is always broader).

For hystorical reasons the density of the grid is given as the energy (in Ry) of the highest reciprocal vector that can be represented on the on the grid. This can be roughly given as 0.5(pi/dr)**2 where dr is the gridspacing in Bohr. The characteristic length of a gaussian with exponent A is given by 1/sqrt(a) (up to a factor 2 depending on the convention used). This means that the cutoff to represent in the same way a gaussian depends linearly on the exponent. Thus one can get a first guess for an acceptable guess can be take from the knowledge that for water with alphaH=47.7 a good cutoff for doing MD is 280 Ry.

It turns out that if one wants to put the whole density on the grid, the core electrons of even the simplest atoms cannot be represented, thus one has to remove to core electrons and use pseudopotentials for the atoms. In Cp2k we use the Goedecker-Teter-Hutter pseudopotentials Goedecker1996, these are harder than other pseudopotentials, but also more accurate.

Smoothing

ñ is optimized for the electrostatic part, but is used also to calculate the echange and correlation potential. Because of this, and because the GTH pseudopotential goes almost to 0 close to the core of the atom, the xc potentisl, especially for gradient corrected functionals, converges badly. Instead of using very high cutoffs one can perform a smoothing of the density, and calculate the derivatives on the grid with other methods than the G-space based derivatives. For MD of water using a cutoff of 280 Ry XC_SMOOTH_RHO NN10 and XC_DERIV SPLINE2_SMOOTH (in the FORCE_EVAL%DFT%SC%XC_GRID section) give good results, please note that these options renormalize the total energy, and the amount of renormalization is dependent on the cutoff. Thus energies with different cutoffs cannot be easily compared, only interaction energies or forces can be calculated.

Methods that do not redefine the total energy are XC_SMOOTH_RHO NONE and XC_DERIV equal to either PW, SPLINE3 or SPLINE2. Thes are listed from the one that assumes more regularity (PW the the one that assumes less regularity SPLINE2. Normally SPLINE2 is a good choice, but for high cutoffs (600 Ry for water) SPLINE3 is better. The default (PW) is not bad, but generally inferior to the others.

Basisset

The QS part of Cp2k uses basis sets that are a linear combination gaussian functions. As the GPW method uses pseudopotentials one cannot use basis set of other programs, at least the core part should be optimized for cp2k. The polarization and augmentation functions can be taken from dunning type basis sets.

Cp2k basis normally are build with an increasing accuracy level starting from SZV (single Z valence, normally only for really crude results), DZVP (double Z valence and one polarization function, already suitable for MD, and give reasonable results), TZV2P (triple Z valence and two polarization functions, fairly good results), QZV3P (quadruple Z valence and three polarization functions, good results). Note that the discussion about the quality of the result is very indicative, and valid for condensed phase MD, for gas phase to reduce the BSSE an augmented basis (aug-) with diffuse basis functions is needed, if you calculate properties that depend on the polarization probably also (aug-) and polarization functions will be important. In any case you should perform a convergence check for your properties with respect to the basis set.

A good set of basis set are available in the BASIS_SET and GTH_BASIS_SETS files. Other can be created with a program, and you can ask on the list or try to optimize them with the program that is part of cp2k.

The basis set to use for an element is set in the FORCE_EVAL%SUBSYS%KIND section with its name with the BASIS_SET keyword.

Pseudo Potential

The GTH pseudopotential that one has to use has a large library of elements available in the POTENTIAL file. The pseudopotential has to be optimized for the exchange correlation functional that one is using. Normally changing from one functional to the other is not too difficult with the optimization program that is part of cp2k. If your element-functional combination is missing ask on the forum.

The pseudopotential to use for an element is set in the FORCE_EVAL%SUBSYS%KIND section with its name.

XC selection

In DFT the choice exchange correlation functional is an important issue, because unfortunately results can depend on it. Here we don't want to discuss how to select it, normally it is a good idea to use the same one as what is used in your field, so that you can more easily compare your results with the literature. An important choice though is whether to use exact exchange or an hybrid functional or not. Hybrid functionals normally improve the description of barriers and radicals, but are much more costly. In general one should start with a GGA and meta GGA and switch to hybrid functionals to check the results or if needed.

In cp2k the xc functional is selected in the FORCE_EVAL%DFT%TDDFPT%XC section. Common functional combinations can be selected directly as section parameters of the XC_FUNCTIONAL for example with the POTENTIAL keyword.

&XC_FUNCTIONAL BLIP
&END XC_FUNCTIONAL

but more complex combination have to be choosen by directly setting the underlying subsections.

WF Convergence

Apart from cutoff and basis set with the GPW method there are two other importan switches that control the convergence of the method. One is FORCE_EVAL%DFT%QS%EPS_DEFAULT that controls the convergence of integral calculation, overlap cutoff, neighboring lists... and sets a slew of other EPS_* keywords so that the KS matrix is calculated with the requested precision. The other is EPS_SCF which controls the convergence of the scf procedure. In general one should have an error ef on the forces one should have sqrt(eps_scf) < ef and sqrt(eps_default) < eps_scf, and in general around 1.e-3 is the error that you need to have reasonable MD.

ot / diag

To perform the SCF and find the groundstate there are two main methods: the traditional disgonalization method accellerated by the DIIS procedure, and the orbital transform method.

FORCE_EVAL%DFT%SCF%SCF_GUESS RESTART is a parameter that controls how the initial density is built, normally one uses either ATOMIC or RESTART. During an MD this setting normally influences only the initial startup.

Traditional diagonalization is like most other DFT codes, it diagonalizes the KS matrix to get a new density from the n lowest eigenvectors. The density for the next scf cycle is built at the beginning just mixing the new and the old density with the FORCE_EVAL%DFT%SCF%MIXING factor. If the new density is close enough to the old density FORCE_EVAL%DFT%SCF%EPS_DIIS then the DIIS procedure is activated.

Diagonalization works well, but for difficult or big systems the OT method [VH03] is better (and in general there is no reason not to use it as default). The OT method directly minimizes the the electronic energy with respect to the wavefunctions. It uses a clever parametrizations of the wavefunctions so that the orthogonality constraint becomes a linear constraint. To activate OT adding the section FORCE_EVAL%DFT%SCF%OT is enough.

The advantage of the OT method, are that being a direct method it always converges to something, that it needs less memory than the diagonalization (important for big systems), and each SCF iteration is faster (but it might need a little more iterations). Anyway normally it is faster than diagonalization.

The speed of convergence of the OT method the preconditioner choice (FORCE_EVAL%DFT%SCF%PRECONDITIONER) is important. FULL_KINETIC is a good choice for very large systems, but for smaller or difficult systems FULL_ALL is normally better. The quality of the preconditioner is dependent on how close one is to the solution. Especially with expensive preconditioners like FULL_ALL being closer might improve much the preconditioner and the convergence. For this reason it might be worthwhile to rebuild the preconditioner after some iterations of the SCF. This might be reached using setting FORCE_EVAL%DFT%SCF%MAX_SCF to a small value, and activating the outer scf loop addint the FORCE_EVAL%DFT%SCF%OUTER_SCF section, and setting its EPS_SCF equal to the one of the scf loop, and chooing a outer MAX_SCF big enough (the total number of iteration is the product of internal and external MAX_SCF.

Also the minimizer of the OT method can be changed withFORCE_EVAL%DFT%SCF%OT%MINIMIZER, the default is CG that uses a very robust conjugated gradient method. Less roubust but often faste is the DIIS method.

MD

Molecular dynamics a good method to perform themodynamical averages, and to look at dynamical properties. It also a good starting point for many other more advanced techniques.

During an MD one expects the density to change more or less continously, thus it is possible to use the solutions of the previous steps to create the new initial guess for the density (and wavefunctions for OT). Indeed cp2k can use different extrapolations techniques (FORCE_EVAL%DFT%QS).

For MD a good guess is PS with an EXTRAPOLATION_ORDER of 3. If you are doing something where from one step to the other there is little continuity, (geometry optimization, montecarlo), then something with little continuity like LINEAR_PS or just the previous density (USE_PREV_P) might be better. If you change the particle number then a USE_GUESS that restarts from scratch with the restart method choosen in FORCE_EVAL%DFT%SCF%SCF_GUESS. The ASPC extrapolation is used in connection with langevin and a reduced SCF convergence (just on SCF step) and hystory restart to simulate big systems.

Trajectory

With the printkey MOTION%PRINT%TRAJECTORY you can control the output of the trajectory. The name of the file is by default projectname-pos-1.xyz and projectname is whatever you have written in GLOBAL%PROJECT. In the prinkey you can also chenge the format from xyz to something else.

An interesting file to check is the projectname-1.ener file, in it you can find the following columns: md_step, time[fs], e_kin [hartree], temp[K], e_pot [hartree], e_tot [hartree], cpu_time_per_step [s]

You should always check it and look at how the system equilibrates.

The .ener file and other interesting trajectory files are all controlled with subsections of MOTION%PRINT.

MD Convergence

If the MD has to be trusted then one should be sure that the trajectory can be trusted. Actually a simple, and very sensitive test that there are no big technical errors is to perform an NVE trajectory and look at the energy conservation. The enrgy conservation has normally two features, short time oscillations (that are larger when the system is not yet equilibrated) and a long range drift. If you express these in K/atom, then you can compare them with the temperature that you are simulating at. Another (equivalent) possibility is to express them as fraction of the kinetic energy. The oscillation and drift (normally per ps, but it also depends on how many picoseconds you want to simulate, and if you want an NVE trajectory) should be small with respect to the kinetic energy (1% or less is a good value, but obviously it depends on the accuracy that you want to achieve, more might be acceptable, or less needed).

To improve the energy conservation one can either improve the forces with EPS_SCF and EPS_DEFAULT, improve ñ increasing the cutoff, and reduce the timestep. For the timestep a good value is normally around 1 fs, but if you are simulating high temperature or special system you might need to reduce it. Normally having good forces and larger timestep is advantageous. Obviously the timestep cannot be smaller than the frequency of the highest oscillations of the system (typically H atoms, that is the reason sometime D atoms are used).

Conserving the total energy well is not enough if the system is very heterogeneous and the interesting part is small. In that case there is the risk that even a large error on it might pass unnoticed. If this is to be excepted (for example the atom with the sharpest gaussian basis set is in the interesting part) then checking that part of the system (oscillations, kinetic energy,...) is advisable.

Equilibration

For the result to be meaningful the system should be equilibrated, and not in a very unlikely state. If one is doing shooting or transition path sampling then this is less true, but still the system should not be in a very high-energy state.

So at the beginning you should build your system in some meaningful way, or from classical simulations. To have a better startingpoint you can optimize the energy (if your system is small), you anneal it, but it is not always needed. Then you can equilibrate it at a given temperature.

To equilibrate a system one can use a crude velocity rescale when he is too far away from the goal temperature (as established by MOTION%MD%TEMP_TOL). Doing an NVE simulation with TEMP_TOL is better way to equilibrate than using the NVT ensamble that uses the Nose-Hoover chain thermostats and might give spurios effects if you are far from equilibrium, as it tryes to conserve some extended quantity.the thermostat can store energy that he will give back at some later point. It should be taken into account that the temperature is not constant, but does oscillate in the NVE ensamble, these oscillations are connected with the specific heat and are inversely proportional with sqrt(N) where N is the system size.

A faster way to equilibrate the system is to use LANGEVIN as MOTION%MD%ENSEMBLE, and use a GAMMA of 0.001 [1/fs] (or larger if you want to equilibrate faster). Langevin introduces a viscous dissipative force and a random forces that are in equilibrioum at the given temperature. For equilibration purposes (and not to simulate removed degrees of freedom, like a solvent), the smaller gamma the longer it takes to equilibrate, and the closer the trajectory is to an NVE trajectory, the larger gamma the more the "environment thermostat" influences the system. Also With langevin if your system is in a bad initial configuration it is a good idea to set a TEMP_TOL, or first anneal a little the system.

gapw

Periodicity

The Cp2k program is optimized for periodic calculations, and those are the default, still it is possible to perform also other kinds of calculations. First an important thing to note is that by convention the cell goes from 0 to h, not from -h/2 to h/2, thus if ypou want to put a cluster at the center of the cell you need to put it in h/2 not in 0.

The periodicity of the neighboring list, and thus of the gaussian collocation/integration is controlled by FORCE_EVAL%SUBSYS%CELL%PERIODIC.

The poisson solver (electrostatics) is controlled in the section FORCE_EVAL%DFT%POISSON. The possible poisson solver are choosen with the POISSON_SOLVER keyword, that can have the values

  • PERIODIC, which can do only fully periodic solvers,
  • ANALYTIC which can do 0, 1d and 2d periodic solutions using analytical green functions in the g space (slow convergence),
  • MT Martyna Tuckermann decoupling, that interacts only with the nearest neighbohr, beware these are completely wrong in the cell is smaller than twice the cluster size (with electronic density),
  • MULTIPOLE, uses a schema that fits the total charge with one gaussian per atom [Blo95].

The periodicity of the poisson solver (electrostatics) is set in FORCE_EVAL%DFT%POISSON%PERIODIC, and the poisson solver choosen should support [document ends abruptly --stern]