Work with large directories in Azure file shares

Here’s a concise summary of the article with the boot-configuration steps converted into a stepper for easy consumption.

Summary

Scope: Recommendations for working with very large directories on Azure file shares mounted from Linux clients (NFS-supported configurations shown in the original article).
Key goals: Reduce latency and improve enumeration performance when directories contain a very large number of files.

Main recommendations

The number of inode hash buckets (influenced by available RAM) affects directory enumeration performance. Increasing ihash_entries reduces hash collisions and improves enumeration.
To apply, add ihash_entries to the kernel command line and reboot. Verifying via cat /proc/cmdline and dmesg confirms the setting.

Stepper: How to increase inode hash buckets

Open /etc/default/grub for editing:

sudo vim /etc/default/grub

Add this line (sets the inode hash table size; may increase memory usage by up to 128 MB):

GRUB_CMDLINE_LINUX="ihash_entries=16777216"

If GRUB_CMDLINE_LINUX already exists, append ihash_entries=16777216 separated by a space.

Apply the changes:

sudo update-grub2

Restart the system:

sudo reboot

Check kernel cmdline:

cat /proc/cmdline

Or inspect dmesg to confirm the inode-cache hash table entries:

dmesg | grep "Inode-cache hash table"

If ihash_entries appears, the setting is applied.

actimeo: Set actimeo to 30–60 seconds to extend client-side attribute caching (acreg/acdir values), reducing repeated attribute fetches and lowering latency for large-directory operations. Testing showed up to ~77% reduction in some workloads (1M files).
nconnect: Use nconnect (recommendation nconnect=4) to enable multiple TCP connections between client and NFS share; useful for multi-threaded/asynchronous workloads.

Use unaliased ls (avoid default aliases like --color=auto) or call the binary directly (e.g., /usr/bin/ls) to avoid extra work performed by aliased options.
Prevent ls from sorting output when order is unimportant. Use /usr/bin/ls -1f or -1U to skip the expensive sorting step (f shows hidden files; U does not). This significantly speeds counting and listing in very large directories.

For backups/copies, use share snapshots as the source rather than the live share with active I/O. Run backup operations against the snapshot to improve reliability and performance. (Link: Use share snapshots with Azure Files)

Skip file attributes: If only file names are needed, use getdents64 with a good buffer to avoid fetching attributes unnecessarily.
Interleave stat calls: If attributes are required, interleave statx calls with getdents64 batches (rather than doing all getdents then all statx) so the client requests entries and attributes together, reducing round trips. Combining this with higher actimeo improves performance.
Increase I/O depth: Configure nconnect >1 and distribute work across threads or use asynchronous I/O to benefit from parallel connections.
Force-use cache: If only one client mounts the share, use statx with AT_STATX_DONT_SYNC to read cached attributes without synchronizing with server, avoiding extra network round trips.