So I had everything set up and working, regular backups with restic to my syncosync box. Then things suddenly stopped working. I get this:
PS C:\restic> restic snapshots
Load(<key/212fb75070>, 0, 0) returned error, retrying after 552.330144ms: sftp: "Failure" (SSH_FX_FAILURE)
Load(<key/212fb75070>, 0, 0) returned error, retrying after 1.080381816s: sftp: "Failure" (SSH_FX_FAILURE)
Load(<key/212fb75070>, 0, 0) returned error, retrying after 1.31013006s: sftp: "Failure" (SSH_FX_FAILURE)
Load(<key/212fb75070>, 0, 0) returned error, retrying after 1.582392691s: sftp: "Failure" (SSH_FX_FAILURE)
[... and so on ...]
That is some generic “sftp error”. But I don’t see what causes this!? Here is what I have checked:
syncosync box is up and reachable
I can login a shell as sosadmin
I can sftp in with the user account used for backup
the mount is not full (some reports on the net quote this error with restic when the mount is full)
I see nothing special in the user.log
Any ideas? I am not sure if it is a problem of syncosync or restic. But it is pretty annoying and I have no clue what could cause it. I could try rebooting but then I might lose the chance to debug this.
Ok, if I login via ssh with the user account set up for backup, I get this:
Linux tims-syncosync 5.10.92-v7l+ #1514 SMP Mon Jan 17 17:38:03 GMT 2022 armv7l
syncosync - this user allows only sftp, rsync, samba and ftp
Last login: Wed Mar 16 18:29:43 2022 from 192.168.178.76
/bin/bash: Input/output error
So I am immediately logged out again. This was not the case previously. So it is not a restic problem but an ssh problem But what?
Aha, in the user.log I now see:
2022-03-16T19:44:37.473318+01:00 tims-syncosync sshd[20012]: error: /dev/pts/1: No such file or directory
Ok, all I can find is that it is somehow related to chroot. I really don’t know anything about that. Why would it suddenly stop working? Any ideas? Anything I can test? Any further logs I can look into?
Ok, I rebooted. I can now login successfully again with the backup user (tim). But the next problem appears. It seems the user is no longer correctly mounting its partition:
What can I do? I don’t know why I run into all these problems, I am really not trying to break it. But better to iron out these things before you acquire a larger user base, I guess.
I now understand that dm-0, dm-1 and dm-2 refer to partitions on the HDD, so the file system problems related to the HDD
I checked the SMART status of the HDD under Windows (under Linux smartctl has problems with Seagate USB drives when uas driver mode is used) and all seems fine
I found a way with “Seagate Dashboard” to set an HDD spindown time of 30 minutes, will have to see if it works but am optimistic
full functionality after reboot (i.e. problems seen in my above screenshot going away) takes roughly 10 minutes (!), so one has to be patient; I wonder if this is normal or there is some file system problem or something; but logs look clean
By the way, I promise I will not keep up this frequency of posting to the forum. Once everything is set up and confirmed working, I intend to just let it sit there and forget about it.
Just a short answer for today: in multiple installation with different HDD there was never such an issue… Also setup time after reboot is more in the area of seconds not minutes… (only the first boot after flashing takes some more time because of SD card expansion).
Maybe connecting the box to a monitor could show you more why it takes so long?
So, my main guess:
Is this a 2.5" Drive and if yes: are there power supply issues?
If it is a 3.5" drive with separate power supply, this should not be an issue.
I am pretty sure, this is not syncosync related but another issue with the HDD - maybe bad cable? But most probable power…
Thanks for your comments, Stevie! It is a 2.5" HDD, but I use the official RPi power adapter rated at 3A. I think a power problem can be excluded. I also don’t think the cable is faulty. My best guess? Yesterday someone was cleaning the room where the HDD sits. It might have gotten bumped or zapped by an electrostatic discharge. I better make sure this does not happen again …
At least something useful came out of it: I found a way to spin down the drive automatically after 30 minutes by using the (awful) “Seagate Dashboard” software.
As for the “long initialisation time”, I have anyway now reformatted and restarted from scratch with the 64 bit version. Will keep an eye out. My suspicion was that maybe the filesystem was damaged. I tried to mount the disk on another PC, but it showed no partitions so I did not know how to run an fsck on it.
ok, good to hear. About the Seagate Dashboard software: you run it once and the drives keeps it persistent?
It’s not a bug, that there are no partitions on the drive, it’s a feature. The drive itself is directly a LVM PV. Nowing that you can use it on any linux platform (gparted shows you even this hidden gem).
I am new to this forum software. Is there somewhere a “solved” check?
Yeah, I suspected that this was intentional - but did not know how to continue. If I ever need to check a drive, with this info I guess I would be able to, thanks.
I don’t know. But I changed the topic to indicate it turned out to be a hardware issue.