so I finished my first full backup to my syncosync box. It’s some 120,000 files worth 1.7 TB of data. The “buddy box” is currently still in my local network. I could see that data were being uploaded while the backup was ongoing.
Now the web interfaces of both boxes display that “Sync with partner has ended at …” (popup text of icon on the top). Also, the network usage diagrams show no more transfers. So one would think that everything has been synced to the buddy box, right!?
But I wanted to make sure. So I logged in to a shell as sosadmin and did a “df -h”. Well, my own box has 1.7 TB used in my local partition, as expected. But the buddy box only has 497 GB used. So more than 2/3 of the data have not yet been synced to the buddy box.
Why has the upload stopped? Will it resume automatically? How can I be sure that all data have been synced if the web interface does not reliably tell me?
Is there a logfile I could check for errors?
(By the way, both boxes were continuously on and connected to the network.)
Hmm… that definitely should not happen. As you noticed right, as long as the sync symbol shows active, syncing is done and when it’s idle, everything should be fine.
Technically, what is happening is: lsyncd is started as soon as the buddy box is visible and this once doing a complete check and then waits with inotify. All files are then fed by lsyncd to rsync processes.
For reliability reasons the sync process is started every night again, so latest after the first restart it should sync again. But I am really curious, why it stopped? Maybe look at the log: everything is logged to /var/log/user.log if you find something of interest, please let us know.
After you had a look at the log, you can also force to restart syncosync completely with a normal sudo syncosync restart
I had a look at the user.log. The only “interesting” entries I could see were ones like these:
2022-03-12T23:04:55.549348+01:00 tims-syncosync lsyncd: Normal, --- OVERFLOW in event queue ---
2022-03-12T23:04:55.843314+01:00 tims-syncosync lsyncd: Normal, --- HUP signal, resetting ---
2022-03-12T23:04:55.849369+01:00 tims-syncosync lsyncd: Normal, waiting for 1 more child processes.
If you wish, I can pack the logs up and send them to you by some private channel (email, …).
I hope that after the automatic nightly restart syncing will resume and the missing data will be synced to the remote buddy. If so, then all would be good from my side and it would just be a minor inconvenience.
If there is anything else you would like me to check, please let me know. Maybe one more suggestion to give users a way to tell if all data are synched: In the popup text of the synch icon you could list the total size allocated on the buddy box (probably it would need to be in terms of bytes or megabytes, not gigabytes or terabytes). Then at least that would give the user some plausibility to check against in the interface of his own box (but would need to be with same number of digits).
Hmm… I would say this overflow could be the reason:
Would it be asked too much to add the changes mentioned to /etc/sysctl.conf and retry a complete backup (perhaps easiest to delete the account and then set it up again?)
Checking the size on the other side is not that easy, but I will add a ticket also for this and then we will see, if there is a chance
Hmmm … I think there is a problem indeed. Fresh this morning I see no change, still 497 GB on the buddy box. My hope was that the automatic nightly restart you talked about would fix it, but it seems it did not. Although: No further overflow messages in user.log since yesterday evening!?
I can try and change the overflow limits on both boxes and do a manual sudo syncosync restart to see if that fixes the problem. If so, then things should be alright, shouldn’t they?
I can also restart the backup from scratch if needed, although I was hoping to avoid it …
ok sudo service lsyncd status tells me, it has been restarted and even tells me, what it needed to do at this time… could you take a look at yours?
If you like to, you can even then try to sudo service lsyncd restart (oh, of course assuming that the connection to the buddy box is up) and see, what is happening.
Mar 13 18:53:12 tims-syncosync lsyncd[13687]: Normal, Calling rsync with filter-list of new/modified files/dirs
/tim/data/locks/4d7c34274d44b07bce275d27372064b67566b0fcf3d4c094b64c2717df317543
/tim/data/locks/
/tim/data/
/tim/
/
/syncosync/sosaccounts-access.json
/syncosync/
Mar 13 18:53:13 tims-syncosync lsyncd[13687]: Normal, Finished a list after exitcode: 0
Mar 14 06:25:18 tims-syncosync lsyncd[13687]: Normal, Calling rsync with filter-list of new/modified files/dirs
/syncosync/sosaccounts-mail.json
/syncosync/
/
/syncosync/sosaccounts-access.json
/syncosync/configbackup/sosconf_20220311172817_tims-syncosync_0adb517d17e527ab7e809bef5ffc185c_JDYkTU51aXNLUWJVeUVtR>
/syncosync/configbackup/
/syncosync/configbackup/sosconf_20220311170925_tims-syncosync_f29c12db9610a25ae1601ff082dc4ad1_JDYkTU51aXNLUWJVeUVtR>
/syncosync/configbackup/sosconf_20220311171014_tims-syncosync_7f73aec60518e6b19d5e3ec8279da924_JDYkTU51aXNLUWJVeUVtR>
/syncosync/configbackup/sosconf_20220311174153_tims-syncosync_699e9f3e7ea5545efd4f4fd1a8dd776a_JDYkTU51aXNLUWJVeUVtR>
/syncosync/configbackup/sosconf_20220311184117_tims-syncosync_6ba628632f1682390f9270e23da9e8e8_JDYkTU51aXNLUWJVeUVtR>
/syncosync/configbackup/sosconf_20220311174508_tims-syncosync_3c17e2afd4145c419735144527977bdf_JDYkTU51aXNLUWJVeUVtR>
/syncosync/configbackup/sosconf_20220311172833_tims-syncosync_b66354d731feba361be75e4f26431aea_JDYkTU51aXNLUWJVeUVtR>
/syncosync/configbackup/sosconf_20220311183854_tims-syncosync_7fe54031c2ff6988b9111c6c088d6f65_JDYkTU51aXNLUWJVeUVtR>
/syncosync/configbackup/sosconf_20220311183843_tims-syncosync_62e2c747b2917a631874f5d3abe6f075_JDYkTU51aXNLUWJVeUVtR>
/syncosync/configbackup/sosconf_20220311172805_tims-syncosync_ff0980562e5497d87b32b8df43440dfa_JDYkTU51aXNLUWJVeUVtR>
/syncosync/configbackup/sosconf_20220311183820_tims-syncosync_663c0c1f325f3d83d0a0498f68235517_JDYkTU51aXNLUWJVeUVtR>
Mar 14 06:25:19 tims-syncosync lsyncd[13687]: Normal, Finished a list after exitcode: 0
So nothing obvious to see here!? Still missing 2/3 of the data on the buddy box.
By the way, scrap my idea to display the allocation in the web interface so that one can check it for plausibility - one will usually not have web access to the buddy box. But I think a failsafe mechanism is needed (as we see). Maybe one could somehow check once per day that the allocated size on both boxes is the same (if no sync is currently running), and if it is not, send an admin email to alert the user?
Active: active (running) since Sun 2022-03-13 09:37:01 CET; 23h ago
How often should it restart? Every 24 hours?
I just restarted it manually (after 23:50 hours, too bad, would have liked to see that it does indeed restart automatically) and can confirm that it started syncing stuff again. That’s good at least!
Here is another puzzle piece. I checked when the overflows happened in the user.log
2022-03-13T00:31:40.352236+01:00 tims-syncosync lsyncd: Normal, --- OVERFLOW in event queue ---
2022-03-13T02:16:02.148525+01:00 tims-syncosync lsyncd: Normal, --- OVERFLOW in event queue ---
2022-03-13T04:18:46.515379+01:00 tims-syncosync lsyncd: Normal, --- OVERFLOW in event queue ---
2022-03-13T06:42:31.786871+01:00 tims-syncosync lsyncd: Normal, --- OVERFLOW in event queue ---
2022-03-13T09:27:00.129476+01:00 tims-syncosync lsyncd: Normal, --- OVERFLOW in event queue ---
So the last one happened yesterday at 09:27 CET. But as I wrote in the previous post, lsyncd had been last started yesterday at 09:37 +1 (so also CET). So why then did it not sync yesterday when it was indeed restarted after the last overflow had occurred?
So that is way more than the 16384 event slots and 8192 watched files currently configured for lsyncd. It seems restic produces lots of small files … while Duplicati makes these 50 MB ZIPs, so you never ran into this particular issue.
So what do you think should be reasonable values for both? 2^20 = 1048576? Are there potential problems with this? Memory should suffice on a 2 GB Pi, shouldn’t it?
Good! I have set them to 1048576 on both boxes and will observe whether there are any further problems. I am not sure if I will be able to run a completely new backup from scratch, but I am actually still adding significantly more files and will watch out whether there are any additional overflows.
Hmmm, so I had changed the values of both on-the-fly (as described in the link you posted) and additionally in /etc/sysctl.conf around 11:15 CET today. (I have not rebooted since.) Still, I see overflows afterwards:
So either updating this value after lsyncd was started does not work (although your link implied otherwise) or this does not fix the problem. It’s not that critical, though, because indeed sync does resume at the latest when lsyncd is periodically restarted.
Stevie, you said the lsyncd would be restarted every 24 hours (the latest). It does not seem so:
sosadmin@tims-syncosync:~ $ sudo systemctl status lsyncd
● lsyncd.service - LSB: lsyncd daemon init script
Loaded: loaded (/etc/init.d/lsyncd; generated)
Active: active (running) since Mon 2022-03-14 09:26:17 CET; 1 day 1h ago
In light of the sync issues I had earlier, I think it would be important as a failsafe to indeed restart it on a regular basis so that it restarts syncing.
One more piece of information: I checked the status of lsyncd on the “buddy box”. As far as I can see, it has not restarted since the last reboot more than 3 days ago. So it seems it is not being restarted as part of the housekeeping after all.