Quantum Chemistry Interface
Quantics is interfaced to a number of quantum chemistry (QC) programs to provide the potential energy surfaces (PES) during direct dynamics simulations. The general scheme is centred on the database (DB) of points, energies and gradients used to provide the PES at any geometry via interpolation. The scheme to provide the potential expansion up to second order at a point Q (e.g. the centre of a Gaussian wavepacket) is as follows- Quantics inquires whether there are suitable points in the DB, i.e. at least 1 point is within a specified distance from Q .
- If a suitable point found use DB data
- Interpolation carried out and potential expansion returned
- If NO suitable point found call QC to calculate new point
- Pass geometry and set up input file
- Run Quantum Chemistry program
- Read output files and add new data to DB
- Interpolation using DB carried out and potential expansion returned
defqchem | Define the code to be used along with type of calculation and any options |
vqchem | calculate the potential energy at a point using the DB |
lhaqchem | call a QC program and calculate the LHA (local harmonic approximate), i.e. the energy, gradient and Hessian. Calls the relevant ddXXX routine. |
ddXXX | there are a set of routines that control the QC calculation. This is done by calls from ddXXX to wrYYY and rdYYY routines that write the input files and read the output files for a particular code and method. For example to run a Gaussian calculation ddgaussian calls wrgaussian followed by rdgaussian (for HF and DFT) or rdgaussiancas (for CAS) |
Adding a New Code
The following changes need to be made in funcqchemmod.f90- In defqchem add a new entry to define ifunc for the new program
if (qcprog(1:ilbl) .eq. 'gaussian') then ifunc = 1 else if (qcprog(1:ilbl) .eq. 'qommma') then ifunc = 2 ..... else if (qcprog(1:ilbl) .eq. 'newprog') then ifunc = X endif
where newprog is the keyword to be given in the dirdyn-section of the input file as an arguent to qcprogram=newprog and X is the next available number. - In lhaqchem add a call to the routines for the new interface
select case ( hopilab1) case default routine = 'lhaqchem' write(message,'(a,i5)') & 'Quantum chemistry program not found :',hopilab1 call errormsg case (1) call ddgaussian(time,xgpoint,v,deriv1,deriv2,dipole,mocoeff,nact,& hopipar1,lfail,outfile,nthreads) case (2) call ddqommma(xgpoint,v,deriv1,deriv2,hopipar1,nddstate,lfail) ... case (X) call ddnewprog(xgpoint,v,deriv1,deriv2,hopipar1,nddstate,lfail) end select
- copy ddgaussian, wrgaussian and rdgaussian to new routines appropriate to writing the input, running the program and reading the output.
The SQL Database
Motivation
The database holds the data for a direct dynamics run. Each 'row' in the database holds one geometry and one of each of the present quantities, such as gradient, energy, molecular orbital coefficients. Previously the data was stored in a variety of binary Fortran files, such as geo.db, pes.db, and so on. This presented two main problems. If multiple invocations of the code were using the database and one was writing to it then corruption would eventually occur. The other issue was that to find a particular record in the database, each preceding record had to be read and discarded. Quite a bit of the database code (the in-memory databases, etc) exists to try to mitigate the second problem.
The concurrency problem can only be solved with some form of locking. The lookup problem can be solved with indexes. Rather than trying to write safe locking or indexing in Fortran, it is much better to use an existing database solution. The simplest that meets these needs is SQLite. SQLite is a library that stores data in a database file and provides the functionality of an SQL engine to all processes accessing that file. We don't (yet?) need the full functionality of SQL just the implicit locking and indexing. It runs on Mac, Windows, and Linux. It is public domain licensed, and is usually installed by default on all Linux distributions.
Implementation
As SQLite is a C code, a wrapper library is needed. This was taken from
"flibs", which
provides a Fortran interface to the C code. It also requires the sqlite3.h file to be installed. This is available from the
"sqlite webpage", or can be be installed via a package manager,
for example apt-get install libsqlite3-dev
on debian, yum install libsqlite3x-devel
on Redhat
or zypper install sqlite3-devel
on Suse.
In quantics, where possible, the non-DB code has been left unaltered.
The initial goal was to get the SQL DB working, before moving to
refactoring the rest of the code to use the DB more efficiently. There's a
new library in lib/db_sql
which contains the C and Fortran
code interfacing to libsqlite, and an assortment of routines which
are intended to be the only means of accessing the DB.
The routines in analyse/
that use the DB have been
altered to use the SQL format. The assumption is that the "at-rest"
state of the database should probably be the SQL format. Its file
format is platform independent (32 vs 64 bit, big-endian vs little-endian);
also, if more than one instance of quantics is reading the same database,
one of them finishing and converting the DB to ascii will interrupt the
others.
Reading and Writing DB Quantities
The DB quantities are written and read with dbrdXXX
and
dbwrXXX
subroutines (XXX
might be
geo
, pes
, etc). The call looks like: call
dbgetXXX( db, id, ...)
where db
is the (already opened) database
structure (use sqlite_types
) and id
is the row
number of the entry to be retrieved. All retrievals by id
are
constant time (independent of the number of rows in the database).
Writing a set of quantities into the DB is a little more involved.
First, a transaction must be started. Transactions are processed by
databases atomically—that is they either happen or they don't.
Failure during a transaction causes the transaction to be rolled back. If a
new row is being created then a call to dbwrentry( db, id,
...)
must first be done, with id
set to zero. This will
return the new row number in id
. Then write the necessary data
with dbwrXXX
routines. Finally, commit the transaction (or
abort it, rolling the database back to how it was before the transaction
started). The transaction should be open for as short as time as possible,
as the database will be locked to other processes until it
finishes. Writing a quantity takes (pretty much) constant time.
Analysis Routines
In analyse/
two new conversion programs,
convdbtosql
and convdbfromsql
are provided to
convert the SQL DB into and from ASCII format. The existing
convdb
program remains in case users have old binary format
DBs they wish to convert. It should be noted that converting the DB to
ASCII at the end of a quantics run could interfere with any other quantics
invocations that are using that DB. Given the nature and ease of use of SQL, keeping the DB in the SQL format and
only converting to ASCII when needed is a good plan.
SQL schema versions
Currently there are two SQL schema versions. The original (1) has one SQL schema for Quantics as a whole.
This has been updated to version 2.
The main difference is that in version 2 the schema changes based on the number of atoms and electronic states in the caculation.
This has the advantage of having each row in the data tables been a single entry, simplifying access.
The change has been made so that the differences between versions are contained in the dbrd
and dbwr
routines.
Therefore old and new databases can all be accessed with the same routines.
Database Tables
The following tables could be found in the SQL database. Note there are some differences in layout between database versions, however, the murky details of this are abstracted within the functions used to read and write to the tables.
-
sqlite_master
This table is provided by default by sqlite. It contains the full schema of the database. A SQL query such as:
SELECT count(distinct(name)) FROM sqlite_master WHERE type='table'
will return a full list of tables present in the database.
- Always present
- Not version dependent
-
refdb
This table contains reference information for the calculation. It lists the number of atoms (Natm), the number of electronic states (Nroot), the (Nddstate), the number of basis sets for a CAS calculation (Nbasis), the ground state energy (E0), the (Dimref) the number of atoms (Nfam), the size of the derivative coupling matrix (Dercpdim), the point group of the database (DBpntgp), the number of symmetry operations of the point group (Nsymop) A SQL query such as:
SELECT * FROM refdb
will return all these values.
- Always present
- Not version dependent
-
refdbgeom
This table contains reference geometry for the calculation. It lists, for all atoms, the atom number (atno), the atom mass (mass), the atom name/type (name) and the cartesian coordinates (x, y, z) A SQL query such as:
SELECT * FROM refdbgeom WHERE rowid=1
will return all these values for the first row (there is one row per atom)
- Always present
- Not version dependent
-
refdbsymmaps
This table contains the mapping of the atoms due to molecular symmetries. It has a row for each symmetry operation (Nsymop), each row has a column per atom (atm{i}, where i is 1 - natm) with the new atom index following the symmetry operation. A SQL query such as:
SELECT * FROM refdbsymmaps WHERE symopno = 1
will return all these values for the first symmetry operation
- Always present
- Version dependent, dbentry column not in dbversion 1
-
info
This table contains infomation on the ab initio calculation, the method used (method), the basis set (basis), the programme (programme). A SQL query such as:
SELECT * FROM info
will return all these strings
- Always present
- Version dependent, whole table not present in dbversion 1
-
versions
This table contains version infomation, the quantics version, multiplied by 1000 (quanticsversion), the SQL schema version (dbversion), the number of last entry to DB (dbentry), currently unused. A SQL query such as:
SELECT * FROM versions
will return all these values
- Always present
- Version dependent, dbentry column not in dbversion 1
-
dbentry
This table contains infomation on each entry into the db, the status of the entry (status), whether it was calculated or estimated, the time created (created), A SQL query such as:
SELECT * FROM dbentry
will return all these values
- Always present
- Version dependent, created column not present in dbversion 1
-
agrad
This table contains the adiabatic gradient values in cartesian coordinates. Contains a row for each DB entry. The number of columns changes based on number of electronic states and atoms. There is a column for each state (i), for each atom (k) and each coordinate (x,y,z) (x_i_k, y_i_k, z_i_k). A SQL query such as:
SELECT * FROM agrad WHERE rowid = 1
will return all these values for the first entry
- Not always present
- Version dependent
-
ahes_{i}
Actually more than one table. Creates one table per electronic state (i). Each table contains the adiabatic hessian values in cartesian coordinates. Contains a row for each DB entry. The number of columns changes based on number of electronic states and atoms There is a column for each state (i),for each atom (k) and each coordinate (x,y,z) (x1_k_l, y1_k_l, z1_k_l - first hessian entries for atoms k & l, x2_k_l, y2_k_l, z2_k_l - second hessian entries for atoms k & l, x3_k_l, y3_k_l, z3_k_l - third hessian entries for atoms k & l). A SQL query such as:
SELECT * FROM ahes_1 WHERE rowid = 1
will return all these hessian values for the first entry for the first electronic state.
- Not always present
- Version dependent
-
apes
Table contains the adiabatic potential energy values. Contains a row for each DB entry. The number of columns changes based on the number of electronic states There is a column for each state (i) (eng_i - the energy for state i). A SQL query such as:
SELECT * FROM apes WHERE rowid = 1
will return all these potential energy values for the first entry.
- Not always present
- Version dependent
-
geo
Table contains the molecular geometries in cartesian coordinates. Contains a row for each DB entry. The number of columns changes based on the number of atoms, 3 columns (x,y,z) for each atom (i): (x_i,y_i,z_i). A SQL query such as:
SELECT * FROM geo WHERE rowid = 1
will return all the coordiantes for the first entry.
- Always present
- Version dependent
-
hes_{i}
Actually more than one table. Creates one table per electronic state (i). Each table contains the adiabatic hessian values in cartesian coordinates. Contains a row for each DB entry. The number of columns changes based on number of electronic states and atoms There is a column for each state (i),for twice each atom (k,l) and each coordinate (x,y,z) (x1_i_k_l, y1_i_k_l, z1_i_k_l - first hessian entries for states i with atoms k & l x2_i_k_l, y2_i_k_l, z2_i_k_l - second hessian entries for states i with atoms k & l x3_i_k_l, y3_i_k_l, z3_i_k_l - third hessian entries for states i with atoms k & l). A SQL query such as:
SELECT * FROM hes_1 WHERE rowid = 1
will return all the hessian values for the first entry in the table for the first electronic state.
- Not always present
- Version dependent
-
moco_{i}
Actually more than one table. Creates one table per basis function (i). Each table contains the MO coefficents. Contains a row for each DB entry. The number of columns changes based on number of basis functions There is a column for each basis function (i) (mo_i). A SQL query such as:
SELECT * FROM moco_1 WHERE rowid = 1
will return all the MO coefficients for the first entry in the table for the first basis function.
- Not always present
- Version dependent
-
nact
Table contains the nonadiabatic couplings in cartesian coordinates. Contains a row for each DB entry. The number of columns changes based on the number of atoms, 3 columns (x,y,z) for each atom (i): (x_i,y_i,z_i). A SQL query such as:
SELECT * FROM nact WHERE rowid = 1
will return all the nonadiabatic couplings for the first entry.
- Not always present
- Version dependent
-
nmodes
Table contains the geometries in normal mode coordinates. Contains a row for each DB entry. The number of columns changes based on the number of atoms, 3N-6 (for non-linear molecules) columns for each mode(i): (nmdof_i). A SQL query such as:
SELECT * FROM nmodes WHERE rowid = 1
will return the geometry in normal modes for the first entry.
- Not always present
- Version dependent
-
pes
Table contains the potential energy values. Contains a row for each DB entry. The number of columns changes based on the number of electronic states There is a column for each state (i) and the coupling of states (k) (eng_i_k - the energy for state i). Can also be complex valued, in which case an extra column will be present. A SQL query such as:
SELECT * FROM pes WHERE rowid = 1
will return all these potential energy values for the first entry.
- Not always present
- Version dependent
The following tables contain infomation on the calculations:
The following tables contain the results of the calculations:
The schema below described for dbversions > 1
symrep
Table contains the symmetry representations. Contains a row for each DB entry. The number of columns changes based on the number of symmetry operations There is a column for each symmetry operation (i) (op_i). A SQL query such as:
SELECT * FROM symrep WHERE rowid = 1
will return all these operations for the first entry.
- Not always present
- Version dependent
trans
Table contains the transformation matrix. Contains a row for each DB entry. The number of columns changes based on the number of electronic states There is a column for each electronic state (i) and the coupling of states (k) (tran_i_j). Can also be complex valued, in which case an extra column will be present. A SQL query such as:
SELECT * FROM trans WHERE rowid = 1
will return all these values for the first entry.
- Not always present
- Version dependent
The Routines
The following routines are used to setup and examine the database. They might work differently depending on SQL schema version but will return the same results
-
subroutine dbgetentries(db, iunit,lpesdb,lgradb,lhesdb,lapesdb,lagradb,& lahesdb,ldipdb,ladipdb,lmocodb,lnactdb,& ltransdb,lnmdb,lsocdb,lasocdb,lddcsoc,& lsocbas,lcoorddb,lsymdb,lcsocdb,lcasocdb)
This routine examines the open database in
db
and returns logicals showing what data is available in the DB. For SQL schema version 2 it reads thesql_master
table to see which tables are present. In SQL schema version 1 it's assumed thatgeo
is always present, it then checks which tables are populated by looking at the first row. A list is printed to required unit of whether a table is present or not -
subroutine dbopen(fname, db, sql_vers,infoarray,flagarray, mode)
This routine opens the database specified in
fname
and returns the DB structure indb
.sql_vers
is which sql version to use (only required if opening new db),infoarray
is an array containing values needed to create a new db (such as number of atoms),flagarray
is an array containg the logical flags controlling which tables are required. Both arrays can be left empty if only reading a database.mode
may be 0, the DB is opened read-only; 1, the DB is opened read-write; 2, the DB is created and opened read-write. -
subroutine dbgetnrec( db, nrec )
Returns the number of rows in the database into
nrec
. This routine takes constant time—it's independent of the number of rows in the DB. -
subroutine dbgetversion(db, dbvers)
Returns the version number of the database into
dbvers
and as a variable contained in thedb
object. Once set, as part ofdbopen
the version can be accessed fromdb%db_vers
. The DB version should indicate to the code what tables will be present. Ideally it should always be backwards compatible, so no changes in column names or order in the table. -
subroutine dbinit(db, sql_vers, quantics_vers, infoarray,flagarray)
Writes the schema into an already opened
db
.sql_vers
is which sql version to use,quantics_vers
is what is the current quantics version (this is saved into theversions
table)infoarray
is an array containing values needed to create a new db (such as number of atoms),flagarray
is an array containg the logical flags controlling which tables are required. Both arrays can be left empty if only reading a database. You can't write into a newly created database until the schema is written. In SQL schema 1 all tables are created, in schema 2 only those required are created -
subroutine dbrdrefdb(db,natm, nroot, nddstate, nbasis, dde0, dimref, & nfam, dercpdim,dbpntgp,nsymop)
Reads the refdb information.
-
subroutine dbrdentry( db, id, stat )
This is a catch-all table for values associated with a row that don't belong anywhere else. Currently that's only
stat
, which is something to do with interpolation status. Future code might add the date the row was inserted / updated, information about how the values in the row were computed, etc. This is the key table: all otherid
values reference the ones here. -
subroutine dbrdgeo( db, id, geom )
This reads the geometry associated with row
id
. Other routines read other values, e.g.:dbrdpes
,dbrdgra
, etc. Note that the routine does not need to know the sizes of the arrays (it's implicit in the data in the DB), but does need the correct shape. Due to this there are some subroutines that take REAL or COMPLEX arguments behind a common interface, such asdbrdpes_r
anddbrdpes_c
. -
function dbhasrefdb(db)
Returns true if the refdb part of the database is populated.
-
subroutine dbstarttrans(db)
Starts a transaction. In DB terms, a transaction is a set of changes to the DB that happen atomically: the DB is either fully changed or not changed at all, and cannot be read or written in any in-between state. This avoids the problem of a separate reader seeing only part of a row that's being written. This routine should be called before any data other than the refdb data is written to the DB, then the transaction finished with
dbcommittrans
. Note that the DB will be locked during the transaction (see below). -
subroutine dbcommittrans(db)
Commits (i.e. writes) an in progress transaction. All writes with
dbwr*
routines are written into the DB. Any readers will see the DB only as it was or after all the writes, never anything in-between. The DB will be locked to writers during the transaction, and will probably (depending on the DB implementation) be locked to readers too. Therefore best practice is to start the transaction, make all the writes from already computed data, and then commit as quickly as possible. -
subroutine dbaborttrans(db)
Aborts an in progress transaction. I can't see any reason that this will be useful but it's here for completeness. Any writes made during the transaction will be rolled back—the DB will end up as it was before the transaction started. Note that if the writing process is killed during a transaction, the transaction will effectively be rolled back.
-
subroutine dbwrentry(db, irecno, interp)
For the "ordinary" case of appending a new row to the database, this routine must be called before any other writes, with the
irecno
value set to0
. On returnirecno
will be set to the row number of the new row. Subsequent calls to thedbwr*
routines should use this new row number.If
irecno
is set to a non-zero number the corresponding row will be updated if theinterp
value is present. If it doesn't exist an error will be raised.Note that all writes should take place in a transaction.
-
subroutine dbwrgeo(db, irecno, ndof, geom)
Writes the geometry in
geom
into the row specified byirecno
. If the geom data for the row already exists it is removed before the new data is written. All thedbwr*
routines work in a similar way.Note that all writes should take place in a transaction.
-
subroutine dbdelallentries( db )
Deletes all rows from the DB.
Note that all writes should take place in a transaction.
-
subroutine dbwrrefdb(db, natm, nroot, nddstate, nbasis, dde0, dimref, & nfam, mref, atnam, xref, dercpdim)
Writes the refdb part of the database. Note that this routine should not be inside a transaction, as it creates its own.
-
subroutine dbgetclosest( db, inpoint, ndof, outpoint, outnum, outdist )
An implementation of a routine that finds the closest geometry to
inpoint
. This is no faster than before, taking linear time to search through the DB. My thought was that other strategies (caching, other spatial search algorithms, etc) could be implemented here, unknown to the caller. -
subroutine dbdeduplicate( db, dupes, checkonly, dist )
Removes entries that have duplicate (as defined by
dist
) geometries. Ifcheckonly
is true then will only populate thedupe
array with logicals indicating duplicates. Ifcheckonly
is false and there are duplicates the routine will start a transaction and remove the duplicates. It re-numbers the remaining entries appropriately.
Adding a new Table
If a new set of data is to be added to the DB, a new table is required. To add this, the following steps must be done. We will add a table newtable which contains the real arrays data(dim1,dim2). The logical lnewdb is set when this data is being included and dbnewdata(dim1,dim2,irec) is the memory array holding the data. dbnewdata andSchema Version 2
- Edit dbgetentries (lib/db_sql/db_sql.F90) Add a new entry (change "newtable" and "lnewdb". Give db name in place of "NEW TABLE").
- Create a new create subroutine
- Edit dbinit (lib/db_sql/db_sql.F90). Add call to the crenetwble subroutine. Use the flagarray to make sure it is only created if needed. Use infoarray to pass any size parameters to table creation
test_string = 'newtable' call checktable(test_string ,table_list,lnewdb,"NEW TABLE",iunit)
subroutine crenewtable(db,sql_vers,cmd) type(SQLITE_DATABASE), intent(inout) :: db integer(long), intent(in) :: sql_vers !SQL database version character(len=1000), intent(out) :: cmd if (sql_vers .gt. 1) then !make pes table !Contains the PES energies for each DB entry !Size, a row for DB entry !Column number changes based on number of electronic states !Have a column for each state and the coupling between them !eng_i_j - the energy for states i & j (only i & j stored not j & i) cmd = "CREATE TABLE newtable (& &id INTEGER NOT NULL & &REFERENCES dbentry(id)& &ON UPDATE CASCADE ON DELETE CASCADE,& &dimension1 INTEGER NOT NULL,& ! change dimension1 for e.g. ndof &dimension2 INTEGER NOT NULL,& ! change dimension2 or delete &value DOUBLE NOT NULL)' ! change DOUBLE if not a real cmd = trim(cmd) // ")" elseif (sql_vers .eq. 1) then !include this to make sure table is only created in correct version endif call sqlite3_do( db, cmd )
Schema Version 1
1. Edit schema.sql (lib/db_sql)
Add an entry such as (editing for correct names / dimensions / type)./* Map of symmetry replicas */ CREATE TABLE newtable (id INTEGER NOT NULL REFERENCES dbentry(id)ON UPDATE CASCADE ON DELETE CASCADE,dim1 INTEGER NOT NULL,dim2 INTEGER NOT NULL, value DOUBLE NOT NULL);
2. Edit dbgetentries (lib/db_sql/db_sql.F90)
Add a new entry (change "newtable" and "lnewdb". Give db name in place of "NEW TABLE").call sqlite3_prepare( db, 'SELECT count(distinct(id)) FROM newtable', stmt,& cols_p) call sqlite3_next_row( stmt, cols_p, finished ) lnewdb = cols_p(1)%int_value > 0 if (db%error.ne.SQLITE_OK) then write (iunit,'(a20,a15)') ' NEW TABLE ','Not Present ' else if (lnewdb) then write (iunit,'(a20,a15)') ' NEW TABLE ','Included ' else if (.not.lnewdb) then write (iunit,'(a20,a15)') ' NEW TABLE ','Not Included ' endif
3. Edit dbinit (lib/db_sql/db_sql.F90)
Add a new entry (change "newtable" for table name)cmd = 'CREATE TABLE newtable (id INTEGER NOT NULL REFERENCES dbentry(id)& &ON UPDATE CASCADE ON DELETE CASCADE,& &dimension1 INTEGER NOT NULL,& ! change dimension1 for e.g. ndof &dimension2 INTEGER NOT NULL,& ! change dimension2 or delete &value DOUBLE NOT NULL)' ! change DOUBLE if not a real call sqlite3_do( db, cmd ) cmd = 'CREATE INDEX newidx ON newtable(id)' ! change "newidx" and "newtable" call sqlite3_do( db, cmd )