Feature #2
openslow writing of waves.sxb
0%
Description
- turning off writing of waves.sxb entirely - this doesn't seem to be possible, even if I put noRhoStorage and noWavesStorage in the initialGuess and all the SCF sections.
- putting in some timing code to figure out what exactly is so slow.
Would you have suggestions for trying either of these possibilities?
Updated by Christoph Freysoldt 19 days ago
Hi Noam,
I cannot reproduce your observation that noWavesStorage and noRhoStorage are ignored, if I use my current version. Which sphinx version do you use (sphinx --version should tell)?
When I put these flags, the log file says "storage omitted", and the sxb files aren't written (except for vElStat-eV.sxb).
if this is not a human mistake (like misspelling the camel case or so), you can try commenting out the write commands in dft/SxHamSolver.cpp in the SxHamSolver::writeData routine.
If something else is slow, let me know. Long time ago, I had problems when writing with certain combinations of parallel netcdf and MPI libraries, but nothing I really understood (different library version solved the problem). I also had severe problems if multiple sphinx runs were trying to the same file, if I by mistake started parallel serial executables instead of MPI. I think that had to do with file locking from the netcdf library. But then also the log-files get corrupted.
But I agree that this slow writing should not happen at all. That's why we had the no...Storage flags in the first place.
Updated by Noam Bernstein 15 days ago
3.0.9. I'd be happy to update if this is something likely to be fixed in a newer version.
This is my main section:
main { scfDiag { blockCCG { blockSize = 32; maxStepsCCG = 4; } dEnergy = 0.001 / 27.211386024367243; rhoMixing = 0.5; spinMixing = 0.5; maxSteps = 100; nPulaySteps = 20; preconditioner { type = KERKER; scaling = 0.5; } noRhoStorage; noWavesStorage; } evalForces { file = "forces.sx"; } }
and this is the initial guess:
initialGuess { waves { lcao {} } rho { atomicOrbitals; atomicSpin { label="L_Fe_0"; spin=2.29999; } } noRhoStorage; noWavesStorage; }
I definitely get rho.sxb and waves.sxb files created by this run. Am I specifying something wrong?
Updated by Noam Bernstein 15 days ago
Oddly, I do see "storage omitted" in the stdout file
tin 2124 : fgrep storage sphinx.stdout storage omitted | Wavefunctions ... storage omitted storage omitted | Wavefunctions ... storage omitted
but the files definitely exist:
tin 2126 : ls -ltr total 323533 -rwx------ 1 bernstei bernstei 982 Mar 17 12:31 base.sx.0* -rwx------ 1 bernstei bernstei 238254 Mar 17 12:31 POTCAR.Fe* -rwx------ 1 bernstei bernstei 565 Mar 17 12:31 struct.sx.0* -rw------- 1 bernstei bernstei 0 Mar 17 12:31 fftwisdom.dat -rw------- 1 bernstei bernstei 12128 Mar 17 12:31 AtomicOrbitals00.dat -rw------- 1 bernstei bernstei 12128 Mar 17 12:31 AtomicOrbitals01.dat -rw------- 1 bernstei bernstei 12128 Mar 17 12:31 AtomicOrbitals02.dat -rw------- 1 bernstei bernstei 1277 Mar 17 12:32 energy.dat -rw------- 1 bernstei bernstei 159 Mar 17 12:32 spins.dat -rw------- 1 bernstei bernstei 317 Mar 17 12:32 residue.dat -rw------- 1 bernstei bernstei 124444 Mar 17 12:32 vElStat-eV.sxb -rw------- 1 bernstei bernstei 383 Mar 17 12:32 forces.sx -rw------- 1 bernstei bernstei 242329 Mar 17 12:32 rho.sxb -rw------- 1 bernstei bernstei 379509440 Mar 17 12:40 waves.sxb -rw------- 1 bernstei bernstei 252642 Mar 17 12:40 eps.0.dat -rw------- 1 bernstei bernstei 252642 Mar 17 12:40 eps.1.dat -rw------- 1 bernstei bernstei 167 Mar 17 12:40 parallelHierarchy.sx.actual -rw------- 1 bernstei bernstei 2849664 Mar 17 12:40 sphinx.stdout
Updated by Christoph Freysoldt 15 days ago
I think I know what is going on.
The waves/rho are written by the evalForces{} group. That is consistent with the timing of the files (forces.sx written directly before rho.sxb), as well as with the source code.
This has been changed in 3.1 (where evalForces never writes rho/waves again), but prior versions should still respect noRhoStorage/noWavesStorage on the code side also for evalForces. Yet, I am not entirely sure if the format checker complains about extra flags being set in the evalForces group - if so, one would have to declare the flags in share/sphinx/std/paw.std.
As an even better solution, it also should be possible to set the flags top-level, i.e., outside of the main{} group, where the format checker allows any settings, and these would be found from within any sublevel. The only exceptions to this general rule are a few settings like dEnergy that occur at multiple levels with different meaning, where the bottom level must not look outside if dEnergy is missing, but rather use a default value.
I didn't do any checks on an actual 3.0.9 installation. If things do not work as expected, let me know.
Updated by Noam Bernstein 14 days ago
OK - moving the noRho and noWaves outside all of the sections indeed works, even with the current 3.0.9 version. That's a good workaround for this particular set of runs, but I'd like to resolve some of the slow I/O issues as well. I guess we can close this issue, and I'll investigate other MPI versions and see whether I see anything systematic. I can always open another issue for that if I can't resolve it.
I don't suppose you happen to remember if the old issues were solved by netcdf or MPI version changes or both?