SMF head scratcher

One of my official duties as a tech lead in my current project is to pick up things that have fallen through the cracks. In a team that is made up mostly of software developer with little experience in delivering a shrink-wrapped product (as opposed to, well, just code) this quite often means playing sysadmin as well. People that have known for a while know that this doesn’t bother me at all; it can be quite fun, especially when I don’t have to deal with Linux.

In this project the client is paranoid^W security conscious enough that we are not allowed to use any test machine inside Sourcesense, rather, we have to use either our own laptops (encrypted and all that) or a test machine inside the client’s network. Right now, this means a Solaris 10 box.

Now, I am a BSD guy, so for me in a sense Sun chose the dark side when they started adopting SysV-isms and started calling the bastard^Wresult Solaris. But I have to admit that the more I see of this Solaris 10, the more I like it. ZFS is way cool; zones are a great concept; and then there is Service Management Facility (SMF).

Well, I will admit that at first reading about it I was less than enthusiastic. In fact, if you just read that page, and you have been around as long as I have, you must be thinking what I did: this sounds a lot like D. J. Bernstein ’s daemontools with just about a gallon of AIX-like enterprisy gravy on top.

But then I started playing with it, and when you get into it things start to make sense. For one thing, the XML format you use for defining things makes a lot more sense than either deamontools config directories or AIX cruft, or classic SysV-style rc*.d directories for that matter. Also, SMF solves the problem of keeping track of pids very nicely (it figures them out for you), which simplifies the logic when you actually need to stop stuff.

However, there is one thing I couldn’t figure out, and that is dependencies. Supposedly, you just declare the dependencies and the system should be smart enough to figure things out. Well, either the system is outsmarting me, or there’s something way, way off. But it looks so fundamentally broken that it’s probably the former.

Say I have two services, application/test-a and application/test-b; test-b depends on test-a, so that when I start test-b SMF starts test-a for me, and if it fails test-b is prevented from starting. So far so good. The problem for me starts when I try to do:

svcadm restart application/test-b

If I do that, application/test-a is stopped and put in a disabled state!

Now that is so counter-intuitive that I’m sure I’m doing something wrong. I am assuming any bug in this area would have emerged and would have been fixed ages ago. But still for the life of me I can’t figure it out.

So, anybody with a Solaris 10 box care to help me out? You can find two simple service definition files (a.xml and b.xml) you can import and try. Just in case it matters, all I can tell you about the machines is (name changed to keep my lawyer happy):

andrea$ uname -a
SunOS host.example.com 5.10 Generic_118855-15 i86pc i386 i86pc

Laza, Zen, any hope you can test this for me?

Comments

6 Responses to “SMF head scratcher”

  1. zen on October 10th, 2006 4:07 pm

    Yes, as soon as I reach my home Solaris 10 machine. I’ll have a look on it.

  2. zen on October 10th, 2006 9:32 pm

    I had a look into it. I tried posting this on my blog, but did run into too many formatting snafus.
    I’ve been able to replicate the problem, first of all have a look into
    /var/svc/log/application/test-a:default.log
    and test-b:default.log

    Try raising the start timeouts from 0 to something higher in the range of 5-10, like

    svccfg
    svc:> select test-a
    svc:/site/food> editprop
    setprop start/timeout_seconds = count: (10)
    svc:/site/food> select test-b
    svc:/site/bard> editprop
    setprop start/timeout_seconds = count: (10)

    and have a look with
    listprop
    inside svccfg.

    That might solve your problem. Let us know if it worked (and maybe post more logs).

    bye

  3. laza on October 11th, 2006 1:10 pm

    I don’t have and don’t want to have any kind of sloaris machine… :-)
    I’m not so found of retro computing.. :-)
    cioa Marco

  4. Andrea on October 11th, 2006 8:34 pm

    Zen, thanks. I’ll have to try when I have my next maintenance window.

    Laza, ehehe. No, Sloaris 10 is pretty cool, really ;-)

  5. David Comay on October 13th, 2006 6:43 am

    I’d suggest posting your question on the OpenSolaris SMF forum

    http://www.opensolaris.org/jive/forum.jspa?forumID=24

    The SMF community lives there and would be more than happy to discuss the issue that you’re seeing.

  6. Andrea on October 17th, 2006 8:49 pm

    Bingo! increasing the timeouts worked!

    David, thanks, I will if I have any other issue or question.

Leave a Reply




*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word