Error message from housekeeping process

From the box running at my brother’s I today got an admin email titled

Cron <root@martins-syncosync> test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )

and within it tons of error messages like

/etc/cron.daily/syncosync:
Cronic detected failure or error output for the command:
/usr/bin/soshousekeeper.py

RESULT CODE: 1

ERROR OUTPUT:
chown: changing ownership of '/mnt/rjail/remote/./syncosync/tim/data/data/cb/cb1f8cb767fac70a60e0f23ecf2463b177646b3f9fba50edee70f56959061cbd': Read-only file system
chown: changing ownership of '/mnt/rjail/remote/./syncosync/tim/data/data/cb/cb1f2469db927fe60d45d53b0e948ccdd6878b3f33527f5c96f6ca6e8743286f': Read-only file system
chown: changing ownership of '/mnt/rjail/remote/./syncosync/tim/data/data/cb/cb59d930b49e7b5684edb14c72dada5c9b14cbf4ccfd34e53d313a236719aec2': Read-only file system
[...]
chown: changing ownership of '/mnt/rjail/remote/.': Read-only file system
Traceback (most recent call last):
  File "/usr/bin/soshousekeeper.py", line 73, in <module>
    main()
  File "/usr/bin/soshousekeeper.py", line 67, in main
    result = housekeeper.check_all(args.nothing)
  File "/usr/lib/python3/dist-packages/soscore/soshousekeeper.py", line 178, in check_all
    self.process_account_mails()
  File "/usr/lib/python3/dist-packages/soscore/soshousekeeper.py", line 242, in process_account_mails
    if mailhandler.send_mail(
  File "/usr/lib/python3/dist-packages/soscore/mail.py", line 198, in send_mail
    part = MIMEApplication(fil.read(), Name=basename(attachment))
OSError: [Errno 5] Input/output error

What causes this? What can/should I do to diagnose/fix it? Should I just ask my brother to power-cycle the box?

Cheers,
Tim

Hmm… how does the box look from UI side?
Everything up and running?
Could you also check, what “sysstate.py -g -p” says?

Hi Stevie,

I am on vacation with my kids and thus cannot currently investigate. Some update, though:

The next day, I again got an admin email with error messages, this time reading:

/etc/cron.daily/syncosync:
Cronic detected failure or error output for the command:
/usr/bin/soshousekeeper.py

RESULT CODE: 1

ERROR OUTPUT:
Traceback (most recent call last):
  File "/usr/bin/soshousekeeper.py", line 73, in <module>
    main()
  File "/usr/bin/soshousekeeper.py", line 67, in main
    result = housekeeper.check_all(args.nothing)
  File "/usr/lib/python3/dist-packages/soscore/soshousekeeper.py", line 137, in check_all
    set_local_ownership()
  File "/usr/lib/python3/dist-packages/soscore/drivemanager.py", line 237, in set_local_ownership
    os.chown(os.path.join(sosconstants.MOUNT[Partition.LOCAL], "."), 0, 0)
OSError: [Errno 30] Read-only file system: '/mnt/local/.'

STANDARD OUTPUT:

I then asked my brother to power-cycle the box (being away and hoping that this might fix the problem - it did not).

Now every morning I get from his box admin emails saying “System is in state startup.” Strangely enough, I also now get every morning from my box admin emails saying “System is in state shutdown.” The fact that my box is acting up now, too, is either a very strange coincidence, or somehow the state of my brother’s box affected that of my box!?

That is all I can do for now. I will investigate next week when I am back.

Ok, I am back home. SSHed in to my box (which is in “state shutdown” for unknown reasons). I looked a little bit around, but then decided to do a “shutdown -r now”. Problem is, now I cannot even ssh in any more. Also a power-cycling did not solve the problem. HTTP access shows “502 Bad Gateway”.

What can I do? Connect a monitor and a keyboard and try to log in locally as sosadmin?

What logs should I look at to see what is going on?

I find it very strange that both my brother’s box and mine start acting up at the same time!?

Hey Stevie,

so I connected a monitor and keyboard to my box and could login successfully. Syncosync has quit with an error message. Here is the relevant info (sorry, screenshot):

Can you make anything of this? Please advise what I should do. I can also pull complete logs if you need them.

The bootup screen also showed an error message, not sure if it isr relevant:

I hope we can quickly fix my box. Fixing my brother’s box will be more cumbersome, as I need to coordinate access via TeamSpeak with him, and if SSH is down, I will not be able to do anything from afar … so let’s first focus on my box then his. I still think it odd that both fail at the same time, though.

Cheers,
Tim

Hi Tim,

yes I have f**d up an update. I guess I can fix this by providing a new package. If this does not work, I will describe the manual procedure.

The reason is: all tests passed, but the tests were all from scratch and did not have to handle an existing old configuration… really sorry for that, but be assured: nothing is lost :slight_smile:

Stay tuned and cheers!

P.S. just saw today your reply…

Hey Steve,

ok, good to know - sh*t happens, and if you include a test with an existing configuration, at least you should be safe for the future. Also, I still saw unattended-upgrades being installed, so would assume that once you push a new syncosync package, and possibly after a reboot, things should work again.

Just one question: Do you think the original problem with the “read-only filesystems” is related or not? Doesn’t sound like it is to me, but also strange that this problem and the botched update happened around the same time?

Cheers,
Tim

Hi Tim,

ok, I’d say the new version should work now.
It basically ignores old config files and does not crash on them.

you can either wait until the unattended upgrades hit you or you can upgrade from commandline manually…

So, after the upgrade, there is no configbackup at all, but in the next night, when the soshousekeeper.py starts, it creates one and this is attached to the admins mail. I hope you have set up mail?!

yes, in fact it is related. As the update was not able to restart the syncosync service, syncosync ended up in the shutdown state (you can check that with sysstate.py -g). This ends up in local volume is mounted ro.

Basically, if syncosync service is stopped, you can play around on the box like with any other linux:

the devices to mount are /dev/sos-{vguuid}/local … remote … system
you could mount them to /mnt/local etc.

Cheers
Steve

Hey Steve,

my box installed a new syncosync package this morning. I power-cycled it afterwards but at least the web-interface is down (502 Bad Gateway). I will have to check further tonight in what state the box is. My brother’s box seems to not have installed the new package yet and I got a strange error mail from the housekeeper basically saying that the SMTP host was not configured (which is funny, as I did get an email). But let’s first see about my box and then his. I will report more tonight once I have had a proper look.

Cheers,
Tim

Ok, I logged into my box and after an “apt update” saw that there was an update to syncosync available. In fact, this morning my brother’s box had installed an unattended upgrade of syncosync, not mine (sorry, I only looked in a hurry).

So on my box I did an “apt upgrade && apt autoremove” and rebooted, and now my box is up and running again. :slight_smile:

Regarding my brother’s box I will ask him to power cycle it. Hopefully, then things are up and running again and I can in particular login from afar to have another look around. Will keep you posted!

Great news!

Hope your brother’s box does also recover. There should be no reason why not… In ancient times it was allowed to ssh in the sos port, but it is now jailed, so you need a remote access via another way.

Good! I am also hopeful it wil be ok after power-cycling. I saw that external login via the remote port is blocked, and I think that is a good decision. I can coordinate a TeamViewer session with my brother and then login from there, no problem.

Another update. My brother power-cycled his box yesterday. It had installed an unattended syncosync upgrade yesterday morning, so should in principle have the bug-fixed version installed. Unfortunately, my box still reports a “problem with remote host”. Also, for the second time, this morning I got an admin error mail from his box with the following content:

/etc/cron.daily/syncosync:
Cronic detected failure or error output for the command:
/usr/bin/soshousekeeper.py

RESULT CODE: 1

ERROR OUTPUT:
Traceback (most recent call last):
  File "/usr/bin/soshousekeeper.py", line 73, in <module>
    main()
  File "/usr/bin/soshousekeeper.py", line 67, in main
    result = housekeeper.check_all(args.nothing)
  File "/usr/lib/python3/dist-packages/soscore/soshousekeeper.py", line 181, in check_all
    self.process_account_mails()
  File "/usr/lib/python3/dist-packages/soscore/soshousekeeper.py", line 245, in process_account_mails
    if mailhandler.send_mail(
  File "/usr/lib/python3/dist-packages/soscore/mail.py", line 216, in send_mail
    if not self.open_server():
  File "/usr/lib/python3/dist-packages/soscore/mail.py", line 167, in open_server
    logger.debug(f"Successfully connected to mail server {smtp_settings.host}")
AttributeError: 'MailConfigModel' object has no attribute 'host'

STANDARD OUTPUT:

In fact, my box (which I think is working fine again) sent me the same error mail. So it seems there is still some problem in the current syncosync version?

I will have to see if I can arrange a TeamViewer session with my brother and then log in to his box via SSH. If ssh access is not possible, it will get complicated because I would need to go visit him to get physical access to his box, which might take some time.

hmm… how could this ever work? It is fixed now.
Sorry for not replying earlier. Dunno, why I do not get a mail upon your replies…

I don’t know. The error messages started at the same time as the problematic update. :man_shrugging:

I will wait for both boxes to update and then I guess these error mails will vanish then. But my brother’s box does not seem to be reachable in spite of having installed the fixed updates (I got the emails re unattended upgrades of package syncosync, and he has even power-cycled the box on my request).

Today, I actually got from his box:

/etc/cron.daily/syncosync:
Cronic detected failure or error output for the command:
/usr/bin/soshousekeeper.py

RESULT CODE: 1

ERROR OUTPUT:
Traceback (most recent call last):
  File "/usr/bin/soshousekeeper.py", line 73, in <module>
    main()
  File "/usr/bin/soshousekeeper.py", line 67, in main
    result = housekeeper.check_all(args.nothing)
  File "/usr/lib/python3/dist-packages/soscore/soshousekeeper.py", line 181, in check_all
    self.process_account_mails()
  File "/usr/lib/python3/dist-packages/soscore/soshousekeeper.py", line 213, in process_account_mails
    if mailhandler.send_mail(
  File "/usr/lib/python3/dist-packages/soscore/mail.py", line 222, in send_mail
    if not self.open_server():
  File "/usr/lib/python3/dist-packages/soscore/mail.py", line 167, in open_server
    logger.debug(f"Successfully connected to mail server {smtp_settings.host}")
AttributeError: 'MailConfigModel' object has no attribute 'host'

STANDARD OUTPUT:
[e[36m09-20 06:25:15.62e[0m] [e[36m          soscoree[0m] [e[33me[1mWe[0m] e[33me[1mfile /etc/syncosync/accountkeys not found for backupe[0m
[e[36m09-20 06:25:15.64e[0m] [e[36m          soscoree[0m] [e[33me[1mWe[0m] e[33me[1mfile /etc/syncosync/trafficshape.json not found for backupe[0m

The reported error you have fixed as you say. But the last two lines seem suspicious to me. Could it be that his configuration was damaged in the course of the problems with the update? I have yet to arrange a TeamSpeak session with him to try and log in via ssh locally.

The host attribute is fixed in the package which was uploaded today.

The warnings formatting look funny, but no, I think they are ok as there was no shaping configuration set and I guess all backup accounts use passwords…

That you do not see your brothers box could of course have multiple reasons. If on his admin ui everything looks fine, then it could be also a wrong dyndns name, closed ports, etc.etc…

One of the next things I am planning is to mail alices box status (partly) to bob and vice versa. So, bob at least knows, that e.g. alices box is not in default state and alice cannot reach bob (which normally means, there is an issue on bob’s side) …

Thanks a lot. I will have to see what is wrong with his box. I already get all the admin emails (I set myself up as admin for both boxes). I doubt that the dyndns or port configuration is to blame, as that worked before and I see no reason why it should have stopped working. Anyway, I will try to coordinate a TeamSpeak session and then see what is going on (if the ssh server on his box is up and running …). Will report back when I know more!

Ok, my brother’s box is up and running again, everything fine. Turns out someone from his family had unplugged it while he was travelling … :wink: As always, thanks for the support, Steve!

1 Like

Haha, the syncosync box needs to look more important :slight_smile:

Hi Stevie, I again have a problem with my brother’s box (mine seems to be doing fine). I get admin error emails with this content:

/etc/cron.daily/syncosync:
Cronic detected failure or error output for the command:
/usr/bin/soshousekeeper.py

RESULT CODE: 1

ERROR OUTPUT:
Traceback (most recent call last):
  File "/usr/bin/soshousekeeper.py", line 73, in <module>
    main()
  File "/usr/bin/soshousekeeper.py", line 67, in main
    result = housekeeper.check_all(args.nothing)
  File "/usr/lib/python3/dist-packages/soscore/soshousekeeper.py", line 138, in check_all
    set_local_ownership()
  File "/usr/lib/python3/dist-packages/soscore/drivemanager.py", line 237, in set_local_ownership
    os.chown(os.path.join(sosconstants.MOUNT[Partition.LOCAL], "."), 0, 0)
OSError: [Errno 30] Read-only file system: '/mnt/local/.'

STANDARD OUTPUT:

My box shows his to be reachable, though, and also still seems to be synching to his box. But the error email comes every morning. Any ideas or advice?