pyratelog

personal blog
git clone git://git.pyratebeard.net/pyratelog.git
Log | Files | Refs | README

commit 0b00fb3536c5308b2e0de5000f16abc7a5c9bb06
parent e783e1f6547d2ea69e8ac0f857506f4dd0a4ab28
Author: pyratebeard <root@pyratebeard.net>
Date:   Thu, 13 Oct 2022 13:15:04 +0100

smoke_me_a_kipper

Diffstat:
Mentry/smoke_me_a_kipper.md | 24++++++++++++++++++++++++
1 file changed, 24 insertions(+), 0 deletions(-)

diff --git a/entry/smoke_me_a_kipper.md b/entry/smoke_me_a_kipper.md @@ -2,4 +2,28 @@ Earlier this year I wrote about my [backup setup](20220414-speak_of_the_dedup.html) and this last week I had to put it to the test. +My PC is a tower that I have on a small stand next to my desk. In the past I had kept the case (an Antec 1200) on my desk but it is rather large and dominates the space a bit too much, I don't have a very big desk. The other day my 1 year old toddled into the study and started pushing the power button on my PC. This power cycled the machine a few times in quick succession. At the time I wasn't aware of this. The next morning I booted up my PC but noticed it was very sluggish. It crashed trying to open my browser. After it happened again I started digging through the logs and noticed some filesystem corruption. + +As I described in my "speak_of_the_dedup" post I have a 3 disk RAID array as my $HOME. Because of the size I only nightly backup important documents, etc. A full backup is done periodically to an external drive I keep in my bug out bag. Unfortunately I had not done a full back in a while, but I knew my nightly backups were good so nothing too important was lost. + +I had used xfs on my $HOME, so I unmounted the device and started an `xfs_repair`. The repair tool very quickly got to Phase 3, showing the output +``` +Phase 3 - for each AG... + - scan and clear agi unlinked lists + - 09:50:01: scanning agi unlinked lists - 0 of 32 allocation groups done +``` + +The last line was repeated every 15 minutes, for over 36 hours, never changing from 0 allocation groups done. I don't think it was doing anything. Eventually I stopped it and ran the repair in check mode. This caused a segmentation fault at Phase 3. I tried again but got the same segfault. + +After a few days of digging around and trying different things I decided the effort wasn't worth it. Reluctantly I accepted my losses and started the recovery. + +Once the RAID array was reformatted I began the data copy from my external drive. This put me back to when it was last backed up. Then I could `rclone` my nightly backups from the last time it ran (before the corruption) and bring that data up to date. + +This got me to a relatively good position. Okay I had lost some random downloads, and a little bit of code that hadn't been pushed to my git server, but nothing serious. It is a little disappointing though, my backup setup is not good enough. + +The reason I don't do a full nightly backup to the cloud is because `rclone` takes so long to copy the data. I decided to look into this, to see if it could be sped up. Reading the man page shows that `rclone` has an option to only transfer files younger than a specified age, `--max-age=`. Using `dedup` means I don't have to transfer everything each time `rclone` runs, only the most recent archive. Testing this brought my nightly backup time down to TK. + +I decided I needed more regular backups of my $HOME, so I needed some more storage. I purchased another external drive which now sits permanently plugged into my PC. I was going to use `dedup` again but decided it would be better to use an alternative tool so I am not relying on only one tool. I opted for `rsnapshot`. The first backup did take a long time, but now each evening I can run `rsnapshot` to backup my $HOME to the external drive and `rclone` that latest archive to the cloud storage. + +Another full backup will still be done to the drive in my bug out bag, I just have to be better at doing it more regularly. At least now if I need to restore I will be able to recover all of $HOME and not only the important things.