Friday, October 28, 2011

Speeding up Kernel Build/Reboot/Test Cycles with iPXE

I make the kernel crash a lot. To debug those crashes, I add a lot of printk()s, recompile the kernel, and make it crash again. I repeat this until I fix the crash. This is time consuming, especially when the crash is bad enough that the system is unusable. For every cycle, I have to:

  1. Gather all of the information I need about the crash
  2. Reboot the system
  3. Boot to a good kernel
  4. Recompile, or copy the kernel over
  5. Reboot again
  6. Load the new kernel

On some hardware, each of these reboots can be upwards of several minutes. Virtually all (>90%) of the time is spent waiting for the firmware before the bootloader and and way before the kernel loads. If I could get this down to one reboot cycle instead of two, it would drastically reduce the amount of time that this takes. Ideally, I'd also like to be compiling in parallel with the system booting.

My solution to this is to use a separate build machine and iPXE (I used to use GRUB for this). In short, iPXE can boot your system from the network. I use it instead of the "boot to a good kernel", "recompile, or copy new kernel over" step.

Step 1: Put your kernel image on a web server

There are many ways to do this, but here's how I do it:

  1. make sure ~/public_html exists and can serve content. If your web server uses a different prefix, make sure and change it.
  2. On my compile server, I take a copy of /sbin/installkernel, put it in ~/bin/installkernel,
  3. Type "make install" in a kernel tree, it runs this script. My script places all of the kernel images in ~/public_html/
  4. Ensure that you can fetch kernel images by navigating to this path on the web server in your browser.

Step 2: Get, Compile, and Install iPXE

Compile yourself:

git clone http://git.ipxe.org/ipxe.git
cd ipxe/src
make -j4 bin/ipxe.lkrn
cp bin/ipxe.lkrn /boot

Step 3: Get iPXE the information it needs to boot

iPXE needs two bits of information to boot:

  1. an IP address
  2. A URL to boot from

Those can be assigned at compile time, passed in by a DHCP server, or passed in at boot-time. I prefer to let the DHCP server assign the IP address, but I pass in the URL at boot-time from GRUB by putting the following in menu.lst:

title iPXE uuid
kernel /boot/ipxe.lkrn && dhcp && chain http://1.2.3.4/~dave/ipxe.script

Remember to fill in your IP address and URL with appropriate values for your server.

Step 4: Tell iPXE from where to fetch a kernel image and what arguments to pass

Note that it references a URL on a web server. We need to go make sure that file exists. The kernel command-line should be a copy of what you see in menu.lst above.

$ cat ~/public_html/ipxe.script
#!gpxe kernel http://1.2.3.4/~dave/vmlinuz root= ro debug profile=2 console=ttyS0,115200
initrd http://1.2.3.4/~dave/initrd-from-boot.img
boot

Again, that will differ for your systems. You can either build the initrd each time you compile a kernel, or just ensure that the kernel image doesn't need any modules at all, and use an arbitrary initrd. Note that the whole kernel command-line is now in this file. That means that you can edit it on the web server with vim or emacs instead of having to do it on the GRUB command-line. I greatly prefer this to trying to use GRUB's console.

iPXE also has support for variables. You can do fun things like fetch a file with the system's IP address in the filename:

kernel http://9.47.67.96/~dave/vmlinuz-${net0/ip}

Monday, October 17, 2011

Working Around "Closed" Framechannel Devices

I got a nice WiFi digital picture frame (Motorola LS1000WB) for my Grandma so that she can keep tabs on the family. It was really handy since it could power on directly in to a state where it fetched pictures from a service called Framechannel. However, the economy evidently got the best of them and Framechannel shut down. The frame now powers on to a nice configuration screen (not very grandma-friendly). It's also closed-source and not very hackable.

However, some intrepid folks have dug up a certification checklist and documented the XML format that the device uses.

The hardest part in all of this is getting a hold of "rss.framechannel.com". You need to trick the frame in to going to a site your control instead of the defunct framechannel one. If you have an OpenWRT or DD-WRT this is fairly simple. You just put an entry in your router's /etc/hosts that says something like:
192.168.22.33 rss.framechannel.com
where 192.168.22.33 is the IP of your web server. You can do this with bind, but it's a bit more involved. After you get this part working, you need some pictures in to Framechannel's XML format. I use this script to fetch a Picasa feed and put it in Framechannel's format. Lastly, you need a webserver which can serve the XML back out. In my case, I needed a path like this:
/productId=MOT001/frameId=00FD53221ABC/language=en/firmware=20090721
I did it with this simple script:
DIR=/var/www/productId=MOT001/frameId=00FD53221ABC/language=en/
mkdir -p "$DIR"
perl picasa-to-framechannel-rss.pl [your RSS feed here] > "$DIR/firmware=20090721"
Note: if you are going to do this, remember that this makes your pictures publicly accessible. You should at least set your web server to not let folks get directory indexes on "/productId=MOT001". But, they can still guess your frameId pretty easily.

Despite the frame itself being closed, the openness of apache, bind/dnsmasq and the XML format it uses allowed the frame to be resurrected from doorstop status to a fully-working frame again.

Thursday, September 22, 2011

addr2line keeps me sane

Doing kernel work, I end up with a lot of text dumps of things. It's typical to get lots of junk that looks like gibberish:
[71818.339389] bash            S 0000000000000000     0  3829   3753 0x00000000
[71818.339389] ffff88007a5fdd38 0000000000000086 0000000000000001 0000000000011e80
[71818.339389] 0000000000000000 ffff88007af31080 ffff88007bc15040 ffff88007fc11e80
[71818.339389] ffff88007af31080 0000000000000000 ffff88007fc11e80 0000000000000000
[71818.339389] Call Trace:
[71818.339389] [] ? check_preempt_curr+0x7a/0x90
[71818.339389] [] ? try_to_wake_up+0x1e5/0x280
[71818.339389] [] schedule+0x45/0x60
[71818.339389] [] schedule_timeout+0x14f/0x250
[71818.339389] [] n_tty_read+0x2f0/0x810
[71818.339389] [] ? try_to_wake_up+0x280/0x280
[71818.339389] [] tty_read+0xa6/0xe0
[71818.339389] [] vfs_read+0xcb/0x170
[71818.339389] [] sys_read+0x55/0x90
[71818.339389] [] system_call_fastpath+0x16/0x1b
Let's say you were trying to interpret this stack trace. Sometimes, the compiler will inline function calls and they might not show up in a stack trace, so it is not immediately apparent how tty_read() might call try_to_wake_up(). You can disassemble or use a debugger, but those both require skill. I prefer to replace having skill with tools instead, which is why I love addr2line. You need to feed it a vmlinux (not a vmlinuz or bzImage mind you), but its output is wonderful:

dave@kernel:~/linux-2.6.git$ addr2line -e vmlinux ffffffff81389ff6
/home/dave/work/linux-2.6.git/drivers/tty/tty_io.c:959
dave@kernel:~/linux-2.6.git$ vi /home/dave/work/linux-2.6.git/drivers/tty/tty_io.c +959
Which points to:
                i = (ld->ops->read)(tty, file, buf, count);
else
i = -EIO;
tty_ldisc_deref(ld); <------------------
if (i > 0)
inode->i_atime = current_fs_time(inode->i_sb);
return i;
}
and it's fairly easy to follow the call path from there.

Sunday, September 18, 2011

SSH Push?

Some VNC servers have a cool feature called "VNC Push". In the traditional setup, the VNC server (the thing with the display you are connecting to) sits around with an open socket and the client connects to it. "VNC Push"ing switches that around. The client sits around with an open socket and has the server connect to it.

This is handy when the "server" is behind a firewall that you do not control, for instance at your grandma's house. You can have an icon on the desktop that says something like "Get Help From Dave" to have it push a connection to your waiting VNC client.

Back to ssh... I have a similar situation to the VNC situation. I am installing a Chumby at a relative's house, and I'd like to still be able to get remote access to it. However, I do not want to depend on the router's port forwarding since it might get reset, or the router replaced. I'm also not sure if their router supports UPnP, nor if I trust it. I've done something similar in the past with ssh port forwarding tricks, but that requires keeping some kind of credentials on the remote side and maintaining a UNIX account (although it can be rssh protected).

I've managed to come up with a decent alternative using socat. socat is like netcat (UNIX command nc) on steroids since allows you to tie together all kinds of bidirectional communication like ptys, but especially sockets. It starts out with running a command on your internet-accessible system:
socat TCP4-LISTEN:12345 TCP4-LISTEN:56789
The '12345' and '56789' are arbitrary (just remember to open them up in your local firewall). You can use just about whatever TCP ports you want as long as they are >1024 and <65536. This tells socat to sit around on tcp port 12345 and wait for a connection. When it gets a connection, it should only then wait for another connection on port 56789. All the traffic that comes in
on port 56789 will be piped back out port 12345. The important distinction from netcat is that this works bidirectionally.

Next, run the following on your ssh server that's behind the firewall:

socat TCP:yourserverfoo.dyndns.org:12345 TCP4:localhost:22
That tells it to connect to yourserverfoo.dyndns.org's TCP port, and then pipe all that traffic to the local port 22 where the ssh daemon is listening. Finally, back on the internet-accessible system, you can do:

ssh -o HostName=localhost -p 56789 my-ssh-push
The "-o HostName" bit is helpful to keep your ~/.ssh/known_hosts file from getting confused about what the key for 'localhost' might be. Once you do this, verify and accept the SSH host key, you should get a chance to log in.

The last step is to set up your remote server to push the connection periodically. In my case, I just run this script on startup in the background. It's important when doing this to realize that just about anybody aware of your scheme could connect to your port and pretend to be your remote system. Be careful, make sure to pre-populate your ssh host keys as well as use key-based authentication instead of passwords.