Last night I woke up in the middle of the night, and walked into my living room, jiggled my mouse out of powersave, closed all my applications, and shut down my box. I didn't bother to check the uptime first, but it had to have been a month or two. I love the fact that my system is stable enough to leave it on all the time. Being able to leave my applications open in the middle of my work and walk away is incredibly valuable to me... but sometimes the cooling-fan noise gets to me at night, and I noticed that my electicity bills have been about double what they were last year when I wasn't leaving my box on all night. I *love* the hibernate feature on my LifeBook, but that only works because it has an APM bios. Hibernation doesn't work with the ACPI bios in my desktop box netherwog ... time to learn to use Software Suspend!
There are currently three implementations of software suspend on Linux, swsusp, pmdisk, and swsusp2. swsusp and pmdisk are built into the 2.6 version of the Linux kernel, but are both somewhat raw. swsusp2 requires kernel patching, but is older and more stable, and works on both 2.4 and 2.6 kernels. I think I will first opt to try swsusp2.
So first I must download, patch, and compile the latest kernel. The instructions on the swsusp2 page are pretty easy to follow.
Oooh! Hey, wait a minute. Do I not have a swap partiton? Ah. I have a swap partition, but it is too small. I have 256 mb of memory but only a 128 mb swap partition. Fortunately I have an old useless 512 mb fat partition that I can replace with swap. So now /dev/hda4 is swap. (at the time, I specificly avoided adding this swap to my /etc/fstab -- that is to say, I formatted it as swap, but I did not USE it as swap. This turned out to be a mistake. The swap partition you suspend to SHOULD be activated and used as swap. read on for more)
I edited my /etc/lilo.conf file to contain two boot options for this kernel.
# Boot up Linux by default. # default=Linux-Resume image=/vmlinuz label=Linux-Resume read-only append="hdd=ide-scsi resume2=swap:/dev/hda4" # restricted # alias=1 image=/vmlinuz label=Linux-NoResume read-only append="hdd=ide-scsi resume2=swap:/dev/hda4 noresume2"
And I can cat /proc/swsusp and I see all the control entries.
Editing /etc/suspend.conf . From the SWSUSP_RESTART_SERVICES section I am removing inetd because I don't have that running at all. To the SWSUSP_UMOUNTS section I am adding /dvd
And now, the moment of truth. To run /usr/local/sbin/hibernate and see if the world explodes or not... okay, that is so cool, I am dizzy! I didn't even lose the ssh connection I was typing this in! I notice the following output on my console when I boot back up...
[root:/home/james]hibernate hwclock: Open of /dev/rtc failed, errno=19: No such device. rm: cannot remove `/var/lock/swsusp': No such file or directory
The lockfile looks unalarming, but I better figure out how to fix the /dev/rtc problem. I don't want my clock to be hibernated too (or rather, it is always hibernated whether I want it or not, I just need to make sure it gets corrected after a resume.)
Ah! All is not well. I tried to start xmms to test sound, but it wouldn't start-- not a sound problem, and xserver problem, apparently. Or maybe a window manager problem. At any rate, I could not start new programs (although the ones I was already running were doing fine.) And attempt to restart my window manager crashed my x-server. After doing startx again, everything started working again. Hmmmm...
I tried manually triggering hibernation with echo > /proc/swsusp/activate and interestingly enough it worked even better (and was much quicker) My X-server had no problems at all that way. My network card handles it fine, and can continue pinging through a hibernation. My USB scanner and joysticks work, my cdrom and dvd/cdwriter seem fine, and the system clock has no problem resyncing to the hardware clock. Only my sound card has a problem. So now instead of using the standard /usr/local/sbin/hibernate script I am using my own ultra-simplified /usr/local/sbin/hibernate.netherwog script:
#!/bin/sh # shut off things that can't handle hibernation /etc/init.d/alsa stop # this does the actual hibernation echo > /proc/swsusp/activate # re-activate things after a resume hwclock --hctosys /etc/init.d/alsa start
The one problem with this is that I have to make sure that no programs are using the sound card when I hibernate. This means that I cannot have xmms playing or paused (but stopped is okay) and I cannot have any flash objects loaded in Mozilla. Those requirements aren't too dire. Annoying, but not nearly as annoying as not being able to hibernate at all.
A couple times in my many many reboots for testing, the resume would fail at the point where the screen flashes, and would reboot. When this happened, it would keep failing in an endless loop (like the hibernated memory image in the swap was bad somehow) so I had to boot to my LinuxNoSuspend boot option, which booted up normally as if I had shut down improperly last time. No disaster. I will be keeping an eye out to try and figure out what conditions trigger that failure.
Now for some extensive testing!
Hmm. This morning it failed again. It got to the middle of the restore, where the screen flickers (video memory being restored?) and the system rebooted. I had to hit CTRL at the LILO prompt again and boot LinuxNoResume. Peculiar.
I changed a setting in the BIOS, and it seemed to help (where have I heard that before?). Anyway, under the ACPI power management settings in my BIOS config I noticed that I was using S3 suspend state, which it described as "less safe". I switched it to S1 suspend state, which was considered more stable, although it consumes a little more power. Obviously power usage is not relevant to me when I am hibernated. I have not suffered one of those spontaneous-reboot-during-resume problems since then (knock on wood)
I should also note that in spite of using hwclock in my hibernate script, my system clock is getting *way* off. I may have to use a timeserver instead. (I have since found a better alternative, read on for more)
Grrr! It was going so well for the past few days! What went wrong today? I have absolutely no clue what triggers this resume failure.
Occasionally I will also have a problem when suspending. It starts to suspend, but hangs saying "Perparing Page Directory". The screen flickers a bit, as if the text is being rapidly redrawn. One time I caught a glimpse of the text "Eating Memory" right before the "Preparing Page Directory" text appeared. Right now, software suspend is simply not usable. About 1/3 of the time it works, but the rest of the time if fails either in the suspend stage or in the resume stage. I have been able to find no logic to when it works vs when it fails... of course, I still TRY to hibernate every time. The EXT3 file system is so reliable, that it does not matter if I do not shut down correctly. I have not lost any data.
I am going to try upgrading from the 2.0rc3 patch to the 2.0rc5 patch, and wish with my eyes closed :)
What a difference an RC-version can make! The 2.0rc5 of swsusp2 has been working beautifully. I have not experienced any supend-lockups not resume-reboots. I have been suspening and resuming frequently with excellent results. Only once did a suspend fail, and this time it failed gracefully. After lingering a few seconds longer than usual at the "Eating Memory" stage, it blacked the screen and then returned cleanly to xlock as it would after a resume. i guessed that this was caused by too many program in memory, or something like that. I closed The Gimp (which spun for awhile, as if it might have been leaking memory) and then I tried the suspend again and it worked.
I was annoyed by the occasional failure to suspend. I updated from 2.0rc5 to 2.0 to 18.104.22.168 but the problem peristed. Sometimes when trying to suspend it would get as far as the "Eating Memory" stage and then dump me back to my desktop. After I switched my window manager from the venerable old ultra-lightweight fvwm2 window manager to the more featureful, more memory hungry Gnome desktop, I found the problem happening more often. Prime troublemakers were memory-hungry applications like Mozilla Firefox and The Gimp. It seemed that I would have to close both of these applications before I could successfully suspend. Very frustrating, since these are the applications I most want to leave open all the time. it really seemed as if I just did not have enough space to suspend all of my memory to disk... but that didn't make any sense...
When I first set up software suspend, I chose /dev/hda4 as my suspend partition. I formatted it as swap, but I did not enable it. I already had a smaller 128 MB swap partition in /dev/hda2, and I reasoned that since my system was working fine with that much swap already, that I would dedicated my 512 MB /dev/hda4 partition soley to software suspend, and not complicate things by trying to use it for both swap and suspend (nowhere did I read that this was a good idea, I just guessed, out of an ignorant misunderstanding of how software suspend works)
# /etc/fstab: static file system information. # #
/dev/hda1 / ext3 errors=remount-ro 0 1 /dev/hda2 none swap sw 0 0 #/dev/hda4 none swap sw 0 0 proc /proc proc defaults 0 0 /dev/fd0 /floppy auto user,noauto 0 0 /dev/cdrom /cdrom iso9660 ro,user,noauto 0 0 /dev/hda5 /var ext3 defaults 0 2 /dev/hda6 /tmp ext3 defaults 0 2 /dev/hda7 /home ext3 defaults 0 2
So I explicity disabled /dev/hda4 in my /ect/fstab file, and that is how I had been suspending all this time. My system has 256 MB of physical memory, and 128 megabytes of swap. Nothing I ever did seriously overflowed the 256 megabytes of memory (except occasionally when editing HUGE images in The Gimp), so things had been working well. But if more than 128 megabytes of meory were in use, my suspends would fail. This is because swsusp2 does not treat your suspend partition as a magical block file onto which it dumps a copy of your current memory, no, rather it just forces all your physical memory into swap, and then adds some sort of magic flag to the swap partition to let itself know to resume on the next reboot. Because I was not enabling /dev/hda4 as swap, that whole 512 MB partition was going to waste, except for the tiny magic flag to tell swsusp2 to resume on the next boot.
So I edited my /etc/fstab file and re-enabled /dev/hda4 and then ran swapon -a
Now I can suspend with no problems even if I have many tabs open in Firefox and large images open in The Gimp. Because my total available swap (128+512 MB) is now larger than my physical memory (256 MB) I whould be able to say good-bye to my failure-to-suspend problems.
Also, I have solved my problems with date/time lag. For whatever reason hwclock was just not doing it for me. My clock was losing all the time spend in hibernation. So now I am using ntpdate in my suspend script right after resume in order to correct the date and time.
Things have been working perfectly lately, which is a sure sign it is time for me to change something that will screw it all up. Today I bought a shiny new ATI Radeon 9200 because I felt a sudden craving for 3D acceleration. The 3D acceleration work perfectly, no troubles at all. But now my software suspend is broken. I can suspend just fine, but after it resumes, instead of showing me my desktop, it shows a mostly-black screen with smears of garbled pixels. Fortunately there seems to be a nice FAQ about swsusp2+radeon at http://cpbotha.net/dri_resume.html
But first, I think I will upgrade from swsusp2 22.214.171.124 -> 126.96.36.199 , as I saw success-reports on the swsusp2 mailing list regarding radeon cards from people using that version.... and though the upgrade went nicely, it did not solve my problem.
I suppose the next thing to do is to try the DRI snapshots for Debian, which are likely to be more current than the DRI that comes standard in debian/sarge. http://www.freedesktop.org/~dri/snapshots/README.Debian
I have the DRI modules installed, but it still doesn't work... Idont think I am actually using DRI correctly. I certainly don't see a radeon module when I do lsmod...
Checking out the DRI troubleshooting page... I can see that the Direct Rendering Manager is available in my kernel because:
[james:~]dmesg | grep drm [drm] AGP 0.99 Aperture @ 0xf8000000 64MB [drm] Initialized radeon 1.7.0 20020828 on minor 0 [drm] Loading R200 Microcode
And I can see that the DRI module for X-Windows is being loaded because:
[james:~]grep dri /var/log/XFree86.0.log | grep LoadModule (II) LoadModule: "dri"
And I can see that the glx module is being loaded because:
[james:~]grep glx /var/log/XFree86.0.log (II) LoadModule: "glx" (II) Loading /usr/X11R6/lib/modules-dri-trunk/extensions/libglx.a (II) Module glx: vendor="The XFree86 Project"
Aha! In my kernel configuration, I have the stock Radeon DRM module built into the kernel, that is why 3D acceleration works even though my newer radeon moudle is not being loaed at all! So I am disabling kernel 2.4.26's stack DRI support entirely, (and switching my agpgart to be a module instead of built in) and recompiling the kernel!
So I booted with my new Kernel, and loaded the agpgart module and the radeon module using modconf. According to glxgears, my 3D rendering is even faster than before, but no Joy yet on the suspend front. I can suspend, but when I resume it gets to the very beginning of the suspend cycle and blackscreens, leaving me with nothing but a blinking cursor and my keyboard LED's flashing. I double-checked suspending from the console, and that hibernated and resumed just fine, so I know just having the DRI modules loaded aren't what's killing me, it's using the DRI modules that is killing me :)
I see a thread on the swsusp2 mailing list where someone mentions getting hibernation working with a radeon with DRI enabled... but he is using kernel 2.6. ... but I read other threadslamenting the brokenness of agp suspend in the 2.6 series, and mentioning that suspending with X+DRI running works fine on 2.4 kernels... but I can't find any specific threads that have info about 2.4 success stories.
So I am giving kernel 2.6.6 a try. I downloaded it, patched it, configured it, compiled it, and installed it. I was puzzled after booting that module support seemed to be disabled despite the fact that it was clearly enabled in my config.
[root:~]lsmod Module Size Used by Not tainted lsmod: QM_MODULES: Function not implemented
I dug around a bit on google and discovered that the module-init-tools package is now required.
Testing my hardware.
Working: video, sound, network, cdrom
Not working:3D, tv-tuner, dvd, scanner, joystick
I'm pretty sure that everything that isn't working is failing because I am missing some needed modules.... ay, yes. for starters I stupidly turned of the agpgart module (mistakenly thinking that the dri snapshot contained a newer version of it).... Ooh:
[root:/home/james]invoke-rc.d module-init-tools restart Calculating module dependencies... done. Loading modules... usb-uhci FATAL: Module usb_uhci not found. input FATAL: Module input not found. usbkbd FATAL: Module usbkbd not found. keybdev FATAL: Module keybdev not found. eepro100 FATAL: Module eepro100 not found. agpgart FATAL: Module agpgart not found. radeon joydev usbhid All modules loaded.
There is everything that is either missing or not compiled as a module. It somteimes seems like certain things just demand to be modules or they won't work... usb_uhci was missing, agpgart was missing. I am going to ignore the orther errors for the moment, especially since eepro100 (my network card) is obviously working as a built-in nonmodule.
Well, I got everything working except for my scanner. 3D acceleration and Hibernation are really nice-to-have features, but a working scanner is indespensable. I reverted back to kernel 2.4.26. I can currently only suspend when X is not running.
testing still in progress...