GRUB GPT HOWTO

This way you can boot a disk using Grub installed on a GPT partition table.

This guide assumes you want a boot filesystem and an LVM physical volume on your GPT partitioned disk.

You may need to recompile Linux. Select the “EFI partition system” in the Filesystems area. You may like to select RAID+LVM modules to use LVM also. You probably don’t need EFI BIOS support unless your machine has this, and therefore need to get a traditional os loader to function…

Normally, Grub does not understand GPT partition tables and needs to be tricked into starting from one. You need to create a very small partition at the start of your disk to hold the grub stage2, (or stage1.5 if you would like to start stage2 from /boot)

Parted

First thing. Create your GPT partition table on your device. I suggest allocating the smallest size possible that parted lets you get away with for the first partition, that is bigger than the stage2 image. The second partition will be your /boot and holds linux and its ramfs images. I suggest around a gigabyte for this. The rest of your hard disk is allocated to the third and final partition, which is your LVM volume.

mklabel gpt
mkpart non-fs 0 2
mkpart ext3 2 130
mkpart lvm 130 30401
GNU Parted 1.8.9
Using /dev/hda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) unit cyl
(parted) p
Model: SAMSUNG SP2514N (ide)
Disk /dev/hda: 30401cyl
Sector size (logical/physical): 512B/512B
BIOS cylinder,head,sector geometry: 30401,255,63.  Each cylinder is 8225kB.
Partition Table: gpt

Number  Start   End       Size      File system  Name    Flags
 1      0cyl    2cyl      1cyl                   non-fs
 2      2cyl    130cyl    128cyl    ext3         ext3
 3      130cyl  30401cyl  30271cyl               lvm

The filesystem types were non-fs, ext3, and lvm, respectively.

Gptsync

Now, as grub does not understand this, you need a fake MBR. Enter gptsync. You can get this program by installing refit and running it on the device you just set up, after exiting grub.

gptsync /dev/hda

gptsync sets up the MBR to point to the fake partitions, hovever the partition ID's will need correcting with fdisk next. Notice that the partition numbers are increased by one compared to GPT, and there is an extra partition as the first one.

Fixup with Fdisk

fdisk /dev/hda
/dev/hda1               1           1          16+  ee  EFI GPT
/dev/hda2   *           1           3       16048+  da  Non-FS data
/dev/hda3               3         131     1028160   83  Linux
/dev/hda4             131       30402   243154342   8e  Linux LVM

Set the first MBR partition type ee, as it is your GPT partition table.

The second partition in MBR is your GPT table's first partition. We use this to store the Grub stageloader, so set the type to da meaning this is not a filesystem.

The third partition is your GPT second, being the /boot partition and given the 83 type to say so.

The fourth is the LVM to-be, in GPT it is the 3rd, and has the type 8e

To linux, the GPT table takes precedence over MBR (but check you have only 3 partition devicenodes, not 4, to be sure), thus /dev/hda1 maps to the stageloader area (rather than the GPT partition table)

If you did forget to compile in GPT support into Linux you can probably get away with this if your LVM partition is not bigger than the maximum that the MBR partition table can support. Just access the grub partition through hda2 instead of hda1 until you get round to re-compiling Linux.

Filesystems

At this time you will want to format the second GPT partition with ext3 and install the /boot files in there

Also, format the LVM volume and add a volume group, say system and add logical volumes as desired, say root and swap.

GRUB 1

Now it is time to install grub

34 is the offset in sectors from the start of the device to the first GPT partition. Each sector is customarily 512 bytes long, so you can find the start of the first GPT partition at offset 0x4400 in hexedit.

The Grub stage1 in the first sector of your hard disk is to load the stage2 from this partition.

#!/bin/bash
# erase partition
dd if=/dev/zero of=/dev/hda1

# length in sectors of stage2
FILE=/boot/grub/stage2
S=$((  ( $(stat -c %s ${FILE}) + 511 ) / 512 ))

# put loader in partition
cat "${FILE}" > /dev/hda1

# install grub
grub --no-floppy --batch << EOF
root (hd1,2)
install (hd1,2)/grub/stage1 (hd1) (hd1)34+${S} (hd1,2)/grub/menu.lst
EOF

Or if you prefer to use a stage 1.5, makes starting up slower but grub can then be upgraded by replacing the stage2 file on /boot.

#!/bin/bash
# erase partition
dd if=/dev/zero of=/dev/hda1

# length in sectors of chosen stage1_5
FILE=/boot/grub/e2fs_stage1_5
S=$((  ( $(stat -c %s ${FILE}) + 511 ) / 512 ))

# put loader in partition
cat "${FILE}" > /dev/hda1

# install grub
grub --no-floppy --batch << EOF
root (hd1,2)
install (hd1,2)/grub/stage1 (hd1) (hd1)34+${S} (hd1,2)/grub/stage2 (hd1,2)/grub/menu.lst
EOF

GRUB 2

You may also use the non-fs partition to improve resilience when starting to use GRUB 2 in place of GRUB 1. GRUB 2 can load itself from the non-fs partition and therefore avoid using blocklists.

There is some info on marking a partition for the installation of GRUB.

In this situation you may not need to use gptsync any more as GRUB 2 understands the GPT tables. boot.img replaces the GRUB1 stage1 and goes in the MBR area. core.img replaces stage2 and will be copied into the non-fs partition.

It is also possible to pre-load grub2 with lvm support, then no /boot volume is needed and we can use a partition for core.img with lvm and ext2 support embedded, and a lvm physical volume containing at least the root filesystem where grub will locate the remaining modules not embedded, config files and background images, we can confirm this by editing grub-install to echo grub_mkimage to see that lvm will be embedded.

parted /dev/hda
set 1 bios_grub on
quit

grub-install /dev/hda

Draft of Soft RAID mirror with GPT and LVM

Here we have the current setup of this system, a RAID mirror with the superblock at the end. We may use Intel matrixRAID where the baseboard uses that, and mdadm 1.0 superblock otherwise.

To find out if a system has matrixraid, also known as IMSM or RST, execute mdadm --detail-platform

An empty response indicates it is not supported, otherwise a message like this is returned.

  1. Platform : Intel(R) Rapid Storage Technology
  2. Version : 12.7.0.1936
  3. RAID Levels : raid0 raid1 raid10 raid5
  4. Chunk Sizes : 4k 8k 16k 32k 64k 128k
  5. 2TB volumes : supported
  6. 2TB disks : supported
  7. Max Disks : 6
  8. Max Volumes : 2 per array, 4 per controller
  9. I/O Controller : /sys/devices/pci0000:00/0000:00:1f.2 (SATA)

If support is present, it is better to use it, as then the system BIOS is aware that drives have managed volumes, and can particpate in a mirror repair, otherwise 1.0 support is like IMSM in that the superblock is at the end of the drives.

This even goes for solitary drives that are not part of an array or use higher level functions such as lvmraid, in this case mdadm can create such single drive "arrays" that can be started and stopped to isolate them from lvm when desiring silence.

Find out how big a mirror Matrix RAID gives by configuring one to try out. It is good to do iteration of the commands to become very familiar with RAID setup and teardown before relying on the OS, in case of errors in this or other guides, issues with the utilties or infamiliarity leading to data loss. We had to create a container and then allocate block devices inside that, it allows the user to have a mirror for a filesystem and stripes for a swap area if that is wanted. I prefer to have a single mirror for now.

mdadm -v -v --create -l container -e imsm --raid-devices=2 imsm0 /dev/sda /dev/sdb

Subsequently mdadm --assemble --scan can be used to set up, if dmraid is active, deactivate that with dmraid -a n, see the container with mdadm --detail /dev/md/imsm0

Now allocate a block volume for mirror in the RAID set, we will use all the space:

mdadm --create -l mirror stat --raid-devices=2 --assume-clean /dev/md/imsm0

It is also possible to create a single drive array on some matrixraid systems.

mdadm -v --create -f -l container -e imsm --raid-devices=1 nn /dev/sda
mdadm -v --create -f -l stripe n --raid-devices=1 /dev/md127
mdadm -v --stop /dev/md126
mdadm -v --kill-subarray=0 /dev/md127
mdadm -v --stop /dev/md127
mdadm -v --zero-superblock /dev/sda

The block devices can be accessed again with mdadm --incremental /dev/md/imsm0 and deleted with a command such as mdadm --kill-subarray=0 /dev/md/imsm0

We made a mirror pair at mdadm --detail /dev/md/stat, linux now checks they are the same, the first sector of the starts at sector 0 so we can put mbr or gpt on it and start from it.

we can stop it (unmount) without deleting it with mdadm --stop /dev/md/stat

At this point destroyed the array and recreate for real with 1.0 superblock, the -z option allows to reduce the space allocated to the mirror to match that we would get from IMSM, allowing easy migration of drives to a Matrix RAID baseboard later.

mdadm -v --create -l mirror -e 1.0 --assume-clean --raid-devices=2  -z $((0x3A381400000 / 1024)) stat /dev/disk/by-id/ata-WDC_WD40EZRX-00SPEB0_WD-WCC4E0227*

Partitioned the resulting mirror like so; the LVM runs from sector 2048 to the end of the RAID volume.

Model: Linux Software RAID Array (md)
Disk /dev/md127: 7814036888s
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: pmbr_boot

Number  Start  End          Size         File system  Name                 Flags
 1      34s    2047s        2014s                     BIOS boot partition  bios_grub
 2      2048s  7814036854s  7814034807s               Linux LVM            lvm

Non-efi systems may require a protective MBR to recognise the drives as bootable, here a 4TiB pair of disks have the MBR set with a single partition covering the maximum space, set to type 0xEE and marked bootable if the BIOS requires that, such as on D945GCLF2 board. The flag is not used for the original purpose to jump to the first sector of the partition, as the GRUB loader is set to locate the bios boot partition in GPT, starting from sector 34, load it and run it.

Expert command (? for help): o

Disk size is 7814036888 sectors (3.6 TiB)
MBR disk identifier: 0x00000000
MBR partitions:

Number  Boot  Start Sector   End Sector   Status      Code
   1      *              1   4294967295   primary     0xEE

If instead user has an EFI capable system the table might look like this, we do away with a bios_grub partition as the EFI can hold a full core.img

EFI is typically mounted as /boot/efi and contains files like /boot/efi/shell.efi and /boot/efi/efi/grub/grub.efi

Model: Linux Software RAID Array (md)
Disk /dev/md126: 1953517568s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start     End          Size         File system  Name        Flags
 1      2048s     2097151s     2095104s     fat32        EFI System  boot, esp
 2      2097152s  1953517534s  1951420383s               Linux LVM   lvm
blockdev --rereadpt /dev/md127
pvcreate /dev/md127p2

At this point we can setup the minimal OS on the LVM, example 20GB root and swap 2GB logical volume, no dedicated /boot filesystem is used, it is kept on the root filesystem.

Also recommended here is to create or edit /etc/lvm/lvmlocal.conf to exclude the raw drives from scanning for volumes:

  1. devices {
  2. filter = [
  3. "r|/dev/disk/by-id/ata-EXAMPLE_DRIVE1_MODEL_SERIAL|",
  4. "r|/dev/disk/by-id/ata-EXAMPLE_DRIVE2_MODEL_SERIAL|"
  5. ]
  6. }

It is even possible to only consider block devices of md and ram flavours for lvm, as md can be solitary devices, pretending they are striped, as well as mirror or parity sets.

  1. devices {
  2. global_filter = [ "a|^/dev/md/.*$|", "a|^/dev/ram.*$|","r|.*/|" ]
  3. }

This has the advantage that if critical volumes are on solid state storage, and liking the sound of silence for a moment, dismount and shutdown volumes on a hard disk backed volume group, use vgchange -a n on that group, mdadm --stop and finally hdparm -y on array member disks. when later access is needed, wake them for re-assembly with mdadm and re-activate the volume group

When the lvm restrictions are implemented, check with vgscan -vvv 2>&1 | grep regex and commit to the initramfs with update-initrd -u. We can do a thorough check by unpack the initrd in /tmp to inspect it: < /boot/initrd.img-`uname -r` gunzip | cpio -i

Some tricks used to persuade GRUB2 to install here, it may refuse to install on the mirror block device, if so we can still install it on one of the hard disks directly and copy sectors affected over the other using a utility such as dd The sectors that comprise the mirror should be identical across the 2 drives.

It is preferable to define GRUB_PRELOAD_MODULES= rather than use --modules with grub-install, because the loader would get reinstalled whenever grub receives updates.

To do this, define a .cfg file, replacing example with an own name:

  1. echo GRUB_PRELOAD_MODULES="mdraid1x part_gpt lvm ext2" > /etc/default/grub.d/example.cfg
source /etc/default/grub.d/example.cfg
grub-install --modules="${GRUB_PRELOAD_MODULES}" /dev/md/stat
sha1sum /dev/md/stat1

Some bios ignore a disk if no partition in MBR is marked bootable, including 0xee partition on a GPT partitioned disk, so may be needed to check that in fdisk

Trial startup

Pre grub we may have the EFI shell appear if the system is missing some correct efi variables, which could be reinstated by reinstalling grub once the system is up, leaving the issue of doing that. Efi shell does have the ability to tab complete, support for the serial console, and othor goodies though.

Shell> fs0:\EFI\grub\shimx64.efi

Grub2 may show rescue prompt at startup, which we may see if we rename the logical volume containing root. There is neither help or tab completion so recovery commands have to be entered in full, at least until we load the normal module.

grub rescue>ls⏎
grub rescue>set⏎

Grub2 lists detected devices, hopefully we see the root lvm amongst them. If not, lvm support was not installed into grub and we need te reinstall it.

grub rescue>set root=(vg-root)
grub rescue>set prefix=(vg-root)/boot/grub
grub rescue>insmod normal⏎
grub rescue>normal⏎

Once normal is loaded we can start a kernel manually.

grub>linux path to kernel root=/dev/mapper/root device additional options
grub>initrd path to initrd
grub>boot

When the system starts, then we can correct the grub config to load normally

Advantages

With the 1.0 mirrored GPT if either drive fails we can swap the remaining drive to the primary channel easily to recover, and are less dependent on a non-redundant startup drive. Then when we get a replacement for the failed unit, we could instruct mdadm to re-introduce it to the array.

Other OS integration

Use bcdedit to call out GRUB from NTloader

Where matrixraid is in use with w32, Intel drivers may initiate a verification if it detects improper shutdown. This may be very questionable on a simple mirror and as it is a fakeraid really slows the computer.

A useful workaround is to create an administrator scheduled task that invokes C:\Intel\NvCacheScripts\rstcli64.exe --manage --cancel-verify Volume_0000 that reacts to user login.