Solved: Sudden problems with sftp (was a hardware issue)

Tim · March 16, 2022, 5:41pm

So I had everything set up and working, regular backups with restic to my syncosync box. Then things suddenly stopped working. I get this:

PS C:\restic> restic snapshots
Load(<key/212fb75070>, 0, 0) returned error, retrying after 552.330144ms: sftp: "Failure" (SSH_FX_FAILURE)
Load(<key/212fb75070>, 0, 0) returned error, retrying after 1.080381816s: sftp: "Failure" (SSH_FX_FAILURE)
Load(<key/212fb75070>, 0, 0) returned error, retrying after 1.31013006s: sftp: "Failure" (SSH_FX_FAILURE)
Load(<key/212fb75070>, 0, 0) returned error, retrying after 1.582392691s: sftp: "Failure" (SSH_FX_FAILURE)
[... and so on ...]

That is some generic “sftp error”. But I don’t see what causes this!? Here is what I have checked:

syncosync box is up and reachable
I can login a shell as sosadmin
I can sftp in with the user account used for backup
the mount is not full (some reports on the net quote this error with restic when the mount is full)
I see nothing special in the user.log

Any ideas? I am not sure if it is a problem of syncosync or restic. But it is pretty annoying and I have no clue what could cause it. I could try rebooting but then I might lose the chance to debug this.

Tim · March 16, 2022, 6:45pm

Ok, if I login via ssh with the user account set up for backup, I get this:

Linux tims-syncosync 5.10.92-v7l+ #1514 SMP Mon Jan 17 17:38:03 GMT 2022 armv7l
syncosync - this user allows only sftp, rsync, samba and ftp
Last login: Wed Mar 16 18:29:43 2022 from 192.168.178.76
/bin/bash: Input/output error

So I am immediately logged out again. This was not the case previously. So it is not a restic problem but an ssh problem But what?

Aha, in the user.log I now see:

2022-03-16T19:44:37.473318+01:00 tims-syncosync sshd[20012]: error: /dev/pts/1: No such file or directory

Tim · March 16, 2022, 6:53pm

Ok, all I can find is that it is somehow related to chroot. I really don’t know anything about that. Why would it suddenly stop working? Any ideas? Anything I can test? Any further logs I can look into?

Tim · March 16, 2022, 7:13pm

Ok, a dmesg gives me:

[413060.465879] Aborting journal on device dm-1-8.
[413060.465900] Buffer I/O error on dev dm-1, logical block 365461504, lost sync page write
[413060.465908] JBD2: Error -5 detected when updating journal superblock for dm-1-8.
[431143.532604] Buffer I/O error on dev dm-1, logical block 0, lost sync page write
[431143.532615] EXT4-fs (dm-1): I/O error while writing superblock
[431143.532622] EXT4-fs error (device dm-1): ext4_journal_check_start:83: Detected aborted journal
[431143.532642] EXT4-fs (dm-1): Remounting filesystem read-only
[436536.312020] EXT4-fs error (device dm-2): ext4_read_inode_bitmap:203: comm rsync: Cannot read inode bitmap - block_group = 4576, inode_bitmap = 149946384
[436536.312067] Buffer I/O error on dev dm-2, logical block 0, lost sync page write
[436536.312075] EXT4-fs (dm-2): I/O error while writing superblock
[436536.312097] EXT4-fs error (device dm-2): ext4_check_bdev_write_error:216: comm rsync: Error while async write back metadata
[436536.312120] Buffer I/O error on dev dm-2, logical block 0, lost sync page write
[436536.312127] EXT4-fs (dm-2): I/O error while writing superblock
[436536.312139] EXT4-fs error (device dm-2): ext4_check_bdev_write_error:216: comm rsync: Error while async write back metadata
[436536.312162] Buffer I/O error on dev dm-2, logical block 0, lost sync page write
[436536.312168] EXT4-fs (dm-2): I/O error while writing superblock
[436536.312200] EXT4-fs error (device dm-2): ext4_check_bdev_write_error:216: comm rsync: Error while async write back metadata
[436536.312223] Buffer I/O error on dev dm-2, logical block 0, lost sync page write
[436536.312230] EXT4-fs (dm-2): I/O error while writing superblock
[436536.312244] EXT4-fs error (device dm-2): ext4_check_bdev_write_error:216: comm rsync: Error while async write back metadata
[436536.312266] Buffer I/O error on dev dm-2, logical block 0, lost sync page write
[436536.312273] EXT4-fs (dm-2): I/O error while writing superblock
[436536.312285] EXT4-fs error (device dm-2): ext4_check_bdev_write_error:216: comm rsync: Error while async write back metadata
[436536.312307] Buffer I/O error on dev dm-2, logical block 0, lost sync page write
[436536.312314] EXT4-fs (dm-2): I/O error while writing superblock
[436536.312326] EXT4-fs error (device dm-2): ext4_check_bdev_write_error:216: comm rsync: Error while async write back metadata
[436536.312348] Buffer I/O error on dev dm-2, logical block 0, lost sync page write
[436536.312355] EXT4-fs (dm-2): I/O error while writing superblock
[436536.312371] EXT4-fs error (device dm-2): ext4_check_bdev_write_error:216: comm rsync: Error while async write back metadata
[436536.312393] Buffer I/O error on dev dm-2, logical block 0, lost sync page write
[436536.312400] EXT4-fs (dm-2): I/O error while writing superblock
[436536.312417] EXT4-fs error (device dm-2): ext4_check_bdev_write_error:216: comm rsync: Error while async write back metadata
[436536.312440] Buffer I/O error on dev dm-2, logical block 0, lost sync page write
[436536.312447] EXT4-fs (dm-2): I/O error while writing superblock
[436536.312469] EXT4-fs error (device dm-2): ext4_check_bdev_write_error:216: comm rsync: Error while async write back metadata
[436536.312491] Buffer I/O error on dev dm-2, logical block 0, lost sync page write
[436536.312498] EXT4-fs (dm-2): I/O error while writing superblock
[436542.430864] Aborting journal on device dm-2-8.
[436542.430909] buffer_io_error: 25 callbacks suppressed
[436542.430919] Buffer I/O error on dev dm-2, logical block 243302400, lost sync page write
[436542.430940] JBD2: Error -5 detected when updating journal superblock for dm-2-8.
[436566.511019] Buffer I/O error on dev dm-2, logical block 0, lost sync page write
[436566.511043] EXT4-fs: 25 callbacks suppressed
[436566.511052] EXT4-fs (dm-2): I/O error while writing superblock
[436566.511072] EXT4-fs error (device dm-2): ext4_journal_check_start:83: Detected aborted journal
[436566.511121] EXT4-fs (dm-2): Remounting filesystem read-only
[436566.511158] EXT4-fs (dm-2): ext4_writepages: jbd2_start: 4096 pages, ino 37494785; err -30

That sounds bad. What does it mean. Problem with the SD card? Or HDD? Both are brandnew!?

Tim · March 16, 2022, 7:27pm

Ok, I rebooted. I can now login successfully again with the backup user (tim). But the next problem appears. It seems the user is no longer correctly mounting its partition:

However, the partitions are successfully mounted in principle:

/dev/mapper/sos--JtjPfS--3IvW--jIoe--lDVu--32NX--9WDQ--KOK5hn-system  9.6G   68K  9.5G   1% /mnt/system
/dev/mapper/sos--JtjPfS--3IvW--jIoe--lDVu--32NX--9WDQ--KOK5hn-local   2.7T  1.8T  919G  67% /mnt/local
tmpfs                                                                 188M     0  188M   0% /run/user/1001
/dev/mapper/sos--JtjPfS--3IvW--jIoe--lDVu--32NX--9WDQ--KOK5hn-remote  1.8T  6.3M  1.8T   1% /mnt/rjail/remote

What can I do? I don’t know why I run into all these problems, I am really not trying to break it. But better to iron out these things before you acquire a larger user base, I guess.

Tim · March 16, 2022, 7:37pm

Ok, and now everything looks normal again. I did nothing. Probably just had to wait a while after the reboot?

Phew, so in summary:

I had some file system issue which screwed up everything (can you tell me from the log if it was related to the sd card or the hard disk?)
a reboot solved the problem (but be patient after rebooting)

I hope this was a one-time issue. Will watch out for similar problems.

Tim · March 16, 2022, 10:24pm

Final post for today:

I now understand that dm-0, dm-1 and dm-2 refer to partitions on the HDD, so the file system problems related to the HDD
I checked the SMART status of the HDD under Windows (under Linux smartctl has problems with Seagate USB drives when uas driver mode is used) and all seems fine
I found a way with “Seagate Dashboard” to set an HDD spindown time of 30 minutes, will have to see if it works but am optimistic
full functionality after reboot (i.e. problems seen in my above screenshot going away) takes roughly 10 minutes (!), so one has to be patient; I wonder if this is normal or there is some file system problem or something; but logs look clean

By the way, I promise I will not keep up this frequency of posting to the forum. Once everything is set up and confirmed working, I intend to just let it sit there and forget about it.

stevieh · March 16, 2022, 11:18pm

Just a short answer for today: in multiple installation with different HDD there was never such an issue… Also setup time after reboot is more in the area of seconds not minutes… (only the first boot after flashing takes some more time because of SD card expansion).

Maybe connecting the box to a monitor could show you more why it takes so long?

So, my main guess:
Is this a 2.5" Drive and if yes: are there power supply issues?
If it is a 3.5" drive with separate power supply, this should not be an issue.

I am pretty sure, this is not syncosync related but another issue with the HDD - maybe bad cable? But most probable power…

Tim · March 17, 2022, 11:12am

Thanks for your comments, Stevie! It is a 2.5" HDD, but I use the official RPi power adapter rated at 3A. I think a power problem can be excluded. I also don’t think the cable is faulty. My best guess? Yesterday someone was cleaning the room where the HDD sits. It might have gotten bumped or zapped by an electrostatic discharge. I better make sure this does not happen again …

At least something useful came out of it: I found a way to spin down the drive automatically after 30 minutes by using the (awful) “Seagate Dashboard” software.

As for the “long initialisation time”, I have anyway now reformatted and restarted from scratch with the 64 bit version. Will keep an eye out. My suspicion was that maybe the filesystem was damaged. I tried to mount the disk on another PC, but it showed no partitions so I did not know how to run an fsck on it.

stevieh · March 17, 2022, 12:25pm

ok, good to hear. About the Seagate Dashboard software: you run it once and the drives keeps it persistent?

It’s not a bug, that there are no partitions on the drive, it’s a feature. The drive itself is directly a LVM PV. Nowing that you can use it on any linux platform (gparted shows you even this hidden gem).

I am new to this forum software. Is there somewhere a “solved” check?

Tim · March 17, 2022, 12:54pm

Yes!

Yeah, I suspected that this was intentional - but did not know how to continue. If I ever need to check a drive, with this info I guess I would be able to, thanks.

I don’t know. But I changed the topic to indicate it turned out to be a hardware issue.