My gaming rig

A bit of story.

In 2012, my brother decided as a gift for going away to college, he would purchase PC parts for me to build my own PC from. There was a $500 combo deal upgrade package from Newegg that he found and made mine. I grew upon that initial build as time went on, adding and replacing parts with newer and better parts, until finally its last build. In 2014, I had better income and could afford to upgrade to higher end parts, so I donated that PC to my sister and built my current gaming rig, which has also since been upgraded over these past 4 years.

The current gaming rig I have started out with much less storage space, a different set of monitors, a different mousepad, video card, and webcam. All of these have been upgraded over time.

In addition, I’ve now added an Intel NUC to my mix of networked devices, which I’ve been using as a utility server. I’ve offloaded nearly all of the virtual machines that my gaming rig has been hosting for 4+ years over to the Intel NUC. This has changed my PC’s performance and resource usage in a favorable direction, and it also means I should be able to take my PC offline more often.

There is one problem I want to tackle, the 2x 5 TB WD Black drives are actually unused. They were originally given exclusive access to a guest virtual machine running my Plex server, and they were configured with LVM, creating a virtual volume of 10 TB. The performance wasn’t that great, however. I transferred the contents of them to a single 6 TB HGST drive and had equal performance with less headache. I want to RAID 1 the 5 TB WD Black drives and then possibly use them for general purpose backup storage, rather than exclusively with my Plex server.

And as for the Plex server, I want to move it off my gaming rig. I’m looking at short-term and long-term right now. In the short, I’m expanding the RAM on my Intel NUC from 16GB to 32GB, so I can host more guests on it, which would allow me to transfer my Plex server. I should then be able to move the HGST 6TB drive to the NUC since the drive is using an external USB3.0 interface. In the long-term, I want to purchase a dedicated storage server, host it at a colocation, and use a RAID 10 array of at least 6×4 TB or 8×6 TB enterprise drives. This would then become the permanent home for my Plex server, giving it the maximum efficiency I can offer.

In the long-term for my gaming rig, I do foresee a possible shift from Intel to AMD. The Ryzen processors have intrigued me. This would mean a new motherboard, which means adopting a new architecture platform. And I’ve never used an AMD processor as my main machine before, either. There’s comfort level hurdles to overcome in that direction. I also won’t need 64 GB of RAM either, since I’m moving the virtual machines to my Intel NUC, so if I build a new desktop system in the future I’ll probably split them to 32 GB. It’s unfortunate that the memory modules between my gaming rig and the NUC aren’t compatible, in addition the NUC has an upper limit of 32 GB memory. Sadness. Maybe I should purchase a second NUC? We’ll see.

A letter to AT&T

Today, I share with everyone an email message I sent to my apartment management about the level of dissatisfaction I have with the AT&T U-verse service.

It’s a bit of a read, but worthwhile I promise.

It was originally sent on March 19, 2018.

Hi,

 

After being at the [redacted] Apartments for a week, I wanted to bring to light my dissatisfaction with the Internet service available at this property. I’d appreciate if this was taken into consideration as well as forwarded to the appropriate AT&T employees.

 

This property exclusively provides AT&T Gigapower for Internet service. I’m extremely unsatisfied with the service provided, specifically because of the egregious level of technical responsibility on AT&T’s part. AT&T provided me with a BGW210 gateway device for connectivity which exhibits the problems described further below. The problems I have reminded my brother and I of similar problems that we had while living with our parents using AT&T U-verse in [redacted], Texas.

 

Please take the mood of my replies below as direct and with reasoned explanations. They are intended to show my frustration.

 

  • AT&T field installation technicians personnel relegated technical support to online and phone service support. This seems like a lack of ownership of responsibility to me. Field technicians should be able to handle technical questions.
  • AT&T online chat services are poorly configured. Consider the difference between http://chatnow.att.com and https://chatnow.att.com, the former displaying a web server setup page instead of the AT&T webpage. As a systems administrator myself, the difference between the data served indicates a lack of thoroughness and professionalism.
  • AT&T online tech support provided incorrect answers and recommendations; I was able to trivially prove this using simple network diagnostics tools such as ping and traceroute.
  • AT&T service regularly has latency/ping spikes indicating poor connectivity or service provisioning. This is easily visible using connection monitoring software such as Smokeping or Pingplotter.
  • AT&T does not provide IPv6 connectivity to residences, thereby contributing to IPv4 address exhaustion. It did not provide IPv6 service to my parents’ [Texas] address on AT&T U-verse despite repeated requests while I was living there for several years of service. AT&T still does not provide IPv6 to even [redacted] Apartment Homes’ brand new apartment complex; the complex is so new that most online address verification services are unable to verify the address. When tech support was asked about this, they stated that IPv6 was disabled and that I would need to speak to “advanced tech support”, then later after the conversation with them was told that the equipment in the area isn’t configured for IPv6 yet as they still have IPv4 available. This reeks of corrupt profit-seeking instead of creating and adopting solutions for customers. IPv6 is a technology that has been adopted by many major Internet providers since as far back as the 1990s, it’s over 20 years old, it isn’t new and should be freely available.
  • AT&T’s BGW210 gateway device is limited to 8,192 NAT table sessions, causing a bottleneck on the number of IPv4 connections that can be open to another device, whether it goes to the Internet or exclusively uses the local network. This bottleneck would otherwise not exist if AT&T permitted the use of customer third-party routers and modems, such as pfSense where NAT table limit can be upwards of 300,000 sessions, or if AT&T permitted IPv6 at this location. Additionally, as an AT&T Gigapower customer with a 1000 Mbps subscription, this bottleneck is effectively a throttle to the Internet, causing connectivity issues with web browsing and online games, as was already experienced in the first two nights at [redacted] Apartment Homes.
  • Drawing on U-verse experience, the first thing that many technicians would do when encountering a problem would be to replace the modem which was provided and required by AT&T. It was replaced so often that the account had been notated (paraphrase from memory) “do not replace the modem, it is not a problem with the modem” since we had gone through so many replacements without solving the service level problems.
  • With U-verse we had service issues so often that my brother ended up in a conference call with the field technician manager, local service manager, and regional manager to troubleshoot the problem. If I recall correctly, it turned out to be a configuration issue on AT&T’s side.

 

In all, AT&T does not provide service to the level of competitors. Other providers allow true bridged mode, allows customers to use their own purchased third-party routers and modems, and have full IPv6 connectivity support. I am disappointed that AT&T does not appear to have the level of technical service I hope for even after over a decade of interactions with them. I wish to protest AT&T being the exclusive Internet provider at [redacted] Apartment Homes, and wish for the property management to consider and prioritize adding additional competing providers such as Verizon Fios as soon as possible.

 

Thanks,
Carl

The only update to occur after this email was AT&T provisioning IPv6 to my BGW210 gateway device. All other issues still persist, including the NAT issue; I found out after the IPv6 was provisioned, IPv6 still uses sessions in the BGW210 NAT table.

AT&T is arguably worse than Comcast.

How gnome-software/systemd software upgrade dropped me to the grub command line, and how I resolved it

Earlier I was messing around in one of my Fedora Linux virtual machines. I’ve had this particular install since Fedora 23 was the latest release. I’ve upgraded it from 23 to 24 and then again from 24 to 25. I completed the upgrade a few weeks back and decided to come back to it. Of course, since it had been a few weeks, after I loaded it up and logged in, GNOME was telling me there were updates available in the Software (gnome-software) application.

Brief warning: this article contains a ton of screenshots as I worked the problem.

Mindlessly, I opened the gnome-software utility and decided to update the system through there. The updates looked pretty benign, whatever. I clicked ‘Restart & Install’ and confirmed. The system rebooted and brought me to the systemd installing updates prompt, awesome. It finished and rebooted once more, however I was immediately dumped to a grub command line. Uh oh.

So I took the #fedora IRC channel on Freenode, explained what had happened, and got some interesting feedback.

First, we tried to identify the problem. We started by trying to figure out why grub command line was coming up immediately while booting rather than the grub boot menu. I found that the grub config file was in fact empty, pretty straightforward then why we’re met with the grub command line.

My initial thought is the gnome-software utility told systemd to reboot for updates, but systemd didn’t install updates correctly. There was probably a kernel update and the grub config got caught in the mess. A little more trial and error with the fellas in #fedora and I was able to tell grub to boot the vmlinuz and initramfs files still visibly present.

Booting didn’t work though. A kernel panic came up while booting, Linux couldn’t mount the root filesystem.

Now I’m wondering if my hard disk is somehow corrupted after the updates. Obviously an update didn’t finish correctly, that much is obvious.

I head back to the #fedora IRC channel and suggest that I try booting the Live CD iso instead to dig around any further, I was tired of messing around in grub’s command line. They agreed this would in fact aid diagnosing the issue, so off I went.

I identified that the hard disk is using a single LVM partition, there are no other partitions on the Master Boot Record (MBR) of the disk. And the LVM partition only has a single logical volume in it called root with the / mount point and is xfs formatted. Pretty strange, and I don’t remember why I chose this layout so long ago.

I decide to remove the Live CD iso from the machine and reboot. What happened next though was pretty weird. I had looked away for a bit to check the IRC channel, when I came back, the machine had booted!

What the hell. Alright, so might as well browse around then. I logged in. I even was greeted with the message that updates were installed successfully!

I decided to open Terminal and verify the disk layout:

Yup. That’s a lvm-xfs partitioned disk. There is in fact no /boot partition, which means the single / partition is the /boot partition. By this point, the people over in #fedora were pretty grateful grub has matured to be able to understand booting LVM partitions, but they were at a loss for words for what was going on as well.

After talking with the #fedora IRC channel some more, I agree to figure out what state causes the machine to drop to the grub command line after upgrading.

I began going through each variable in my tests:

  • Default partitioning format
  • Custom partitioning format
  • Upgrading system using gnome-software
  • Upgrading system using dnf

The default partitioning format is just as you would expect–you load up the Live CD iso with a blank hard disk, you install Fedora to that hard disk using the default partitioning that anaconda chooses for you. No modifications.

The custom partitioning format is a bit different. You start with a single LVM partition and then you create an xfs partitioned with the / mount point:

I and the people in the #fedora IRC both found that installing Fedora with the custom partitioning format and upgrading using gnome-software would cause Fedora to be dropped to the grub command line every time, and only repairable by booting the Live CD iso again and mounting the filesystem so that the xfs journal is replayed after the updates.

Getting to that diagnosis though took several hours, and after confirming with others I opened the bug report over at Red Hat: Bug 1416650 – Upgrading using gnome-software/systemd with lvm-xfs custom partitioning format causes grub boot failure

Hopefully I get an answer back from the Fedora and/or Red Hat teams. Quite a head scratcher at first, but obviously a bug since it’s supposed to be supported and doesn’t indicate otherwise.

Cheers.

Replacing your server is not always necessary after encountering boot failure

First, a little background context before I dive in to how I recovered the virtual machine from boot failure…

  • The host has three physical disks mapped to the virtual machine which I’ll be recovering.
    • One disk is a 256 GB SSD and is unused/unallocated, the other two are 5 TB HDDs paired together via LVM to create a single 10 TB disk inside the guest’s filesystem.
  • The host runs Windows 7 64-bit, runs VirtualBox 5.1.10, and the guest runs Fedora 25 64-bit on Linux kernel 4.8.15-300.fc25.x86_64.
  • The guest uses SystemD and is referred to as ‘seedbox’ in my logs and snapshots.

Earlier this morning, the host completely froze, not even so much as a BSOD. There is a hardware problem with the host’s RAM that I have yet to completely diagnose. I ended up pushing the reset button on the chassis. All running guests were immediately aborted due to my power reset.

The host system rebooted just fine. Upon turning back on all of the guests, the ‘seedbox’ guest entered into SystemD emergency mode while it was booting. Uh oh.

Turns out that when the host rebooted, the physical disk mappings present for the guest were changed by either Windows or the host’s BIOS itself, and VirtualBox was not aware or did not track the difference. The SSD and one of the HDDs were no longer the correct physical disks. The LVM volume group was therefore invalid and auto-mounting the LVM logical volume failed. This was the root cause of boot failure within the guest.

Okay, so we’ve now identified the cause of boot failure. What about recovery? Is all the data corrupt now?

The guest failed to mount the LVM logical volume, which prevented it from possibly writing to the disks, causing corruption. The SSD and HDDs were never mounted by Linux, not even in read-only, so they were preserved during the boot failure. This is a good indicator that the data should be completely intact if we can rebuild the mapping.

Looking into correcting the physical to virtual disk mapping, I found out that due to the shift in mappings, the newly assigned physical disks were actually possibly catastrophic. One of them was the host’s boot drive, which even VirtualBox says NEVER to mount in a guest: “Most importantly, do not attempt to boot the partition with the currently running host operating system in a guest. This will lead to severe data corruption” [VirtualBox Manual 9.9.1]. I quickly commented out the auto-mount for that LVM logical volume and powered off the virtual machine, then looked into correcting the mapping via the host.

In VirtualBox, if you want to map physical disks to virtual ones, you need to run a utility called VBoxManage as root/administrator, and call some internals of the utility that are vaguely documented in section 9.9.1 in the advanced topics chapter of the VirtualBox manual (link). This is non-trivial for those just getting started with virtual machines and disk management, since there’s a high risk for corruption, as the manual indicates. If you map your host’s physical boot disk to the guest, then try to boot off of it, you’ll almost always cause severe corruption, this is what the manual is warning everyone about.

So off I went. I opened up the Command Prompt in administrator mode on the host and went to the directory the virtual disk files were in. In the VirtualBox window, I removed the disks from VirtualBox’s Media Manager, then switched back to Command Prompt. I then replaced the .vmdk files with new copies that had corrected physical drive ids. I then re-added the virtual disk files into VirtualBox, attached them to the seedbox guest, and booted up the guest.

The guest booted up successfully. Since I uncommented the failing drive configuration, it was able to boot normally, with only services relying on that mountpoint failing to run correctly. The core system was intact.

I uncommented the auto-mount line for the LVM logical volume I commented out earlier, re-mounted all auto-mounts via mount -a, and then listed the directory inside the LVM logical volume. Success!

I then rebooted the virtual machine and it came up all on its own in normal boot mode, no more emergency mode or boot failure. It acted as if it never fell ill.

To anyone this helped out or if this was a great story to read, I encourage you to follow me on social media or this blog, see you next time!