Linux ate my RAM (2009)

gslin · 2023-10-09T21:20:47

neonate · 2023-10-09T22:37:15

https://web.archive.org/web/20230928045611/https://www.linux...

mjb · 2023-10-09T18:26:02

There's some really interesting little details here.

Linux, by default, is making the very reasonable assumption that the marginal cost of converting empty physical memory into caches and buffers is very near zero. This is fundamentally reasonable, because the cost of converting empty memory into used memory isn't really any cheaper than converting a clean cached page into used memory. It's a little more subtle when you take accounting into account, or when you think about dirty pages (which need to be written back to clear memory), or think about caches, but the core assumption is a very reasonable one.

Except for on some multi-tenant infrastructure. Here, "empty" pages don't really exist. There's mostly not an empty page of memory kicking around waiting (like there is on client devices). Instead, nearly all the memory on the box is allocated, but each individual guest kernel doesn't know the full allocation. In this world, the assumption that the marginal cost of converting empty to full is zero is no longer true. There's some real cost.

Projects like DAMON https://sjp38.github.io/post/damon/ exist to handle this case, and similar cases where keeping empty memory rather than low-value cache is worse for the overall system. These kinds of systems aren't super common, especially on the client side, but aren't unusual in large-scale cloud services.

mjb · 2023-10-09T18:42:32

The other interesting detail here is the memory sizing problem. If I can consume all my RAM with caches and buffers, how much RAM do I need? The answer (as always) depends on what you're optimizing for. For performance, bigger is better. For cost, energy, etc you're going to want some way to calculate whether adding more RAM (and so having bigger caches) is worth the cost, heat, power, etc.

Gray and Putzolu's classic "The 5 minute rule for trading memory for disc accesses" (https://dl.acm.org/doi/pdf/10.1145/38713.38755) from 1987 is probably one of the most important CS systems papers of all time. In it, they lay out a way of thinking about memory and cache sizing by comparing the cost of holding cache to the cost of access (this isn't the first use of that line of thinking, but is a very influential statement of it). Back then, they found that storing 4kB in RAM for 5 minutes costs about the same as reading it back from storage. So if you're going to access something again within 5 minutes you should keep it around. The constants have change a lot (RAM is way cheaper, IOs are way cheaper, block sizes are typically bigger) since then, but the logic and way of thinking are largely timeless.

The 5 minute rule is a quantitative way of thinking about the size of the working set, an idea that dates back at least to 1968 and Denning's "The working set model for program behavior" (https://dl.acm.org/doi/10.1145/363095.363141).

Back to marginal costs - the marginal cost of converting empty RAM to cache is zero in the minute, but only because the full cost has been borne up front when the machine is purchased. It's not zero, just pre-paid.

LeifCarrotson · 2023-10-09T20:42:26

Huh, never heard of that before. An interesting paper!

Running the numbers - assuming 4k record size instead of 1k, ignoring data size changes, ignoring cache, ignoring electricity and rack costs, selecting a $60 Samsung 980 with 4xPCIe and a $95 set of 2x16GB DDR5-6400 DIMMs...I get $0.003/disk access/second/year and $0.0000113 for 4k of RAM, a ratio of 264.

That is remarkably close to the original paper's ratio of 400, even though their disks only got 15 random reads per second, not 20,000, and cost $15,000, and their memory cost $1000/MB not $0.002/MB.

I'm not sure the "Spend 10 bytes of memory to save 1 instruction per second" works equally well, especially given that processors are now multi-core pipelined complex beasts, but working naively, you could multiply price, frequency, and core count to calculate ~$0.01/MIP (instead of $50k). $0.01 is about the cost of 3 MB of RAM. Dividing both by a million you should spend 3 bytes, not 10 bytes, to save 1 instruction per second.

justsomehnguy · 2023-10-10T00:08:29

> $60 Samsung 980

If this is a Hetzner machine then yes, but enterprise SSDs costs more, especially from enterprise vendors. But this only drives the storage cost up.

More so, if you tend to send some big amount of data every 5 minutes and you are somewhat constrained by memory (32 / 1000 x 100 = 3.2%) then it would be easier to just read it from the storage again. If you are not constrained by storage bandwidth, of course.

And by the way, the latest gaming consoles (at least PlayStation?) is designed around this concept - they trade having big amount of RAM (which in case of PS5 is shared between GPU and the OS) to just loading assets from the storage extremely fast 'just in time'. Which works fine for games.

GTP · 2023-10-09T19:12:22

> Back to marginal costs - the marginal cost of converting empty RAM to cache is zero in the minute, but only because the full cost has been borne up front when the machine is purchased. It's not zero, just pre-paid. .

Or, in other words, you get to fully use what you paid for.

teruakohatu · 2023-10-09T19:39:53

I think the OPs point was that people tend to buy more RAM than they actually need because they have no idea how much RAM they actually need, because it's always used, and so err on the side of caution

Arrath · 2023-10-10T00:03:44

Well you don't have to call me out for running 64gb of ram in my home desktop like that.

dist-epoch · 2023-10-09T19:11:54

> but each individual guest kernel doesn't know the full allocation

I was under the impression that at least in some virtual machine types the guest kernel is collaborating with the host kernel through vm drivers to avoid this problem.

mjb · 2023-10-09T19:26:38

Well, yeah. But (DAMON and friends aside), Linux doesn't handle that non-zero marginal memory cost well today.

jsight · 2023-10-09T18:17:40

I once worked at a government job and took my computer into the IT department for an issue. I can't remember anything about the original issue.

But I do vividly remember the help desk trying to figure out one last issue. Some process was consuming all my resources.

They never could figure out why "System Idle Process" kept doing that.

sweetjuly · 2023-10-09T19:51:19

A few years ago a similar issue cropped up on macOS where when the device was extremely hot (at least on intel), you'll see kernel_task seemingly using a ton of CPU time. What was actually happening is that the kernel was scheduling an idle thread which just put the CPU to sleep in an effort to lower the temperature beyond what it could achieve with fans and DVFS.

justsomehnguy · 2023-10-10T00:48:57

Back in the day I was asked why it consumes 2 seconds of CPU Time each second.

I couldn't answer that at time. Took a bit more years and understanding until I remembered that situation and it was obvious for me.

lynguist · 2023-10-10T08:17:45

What’s the answer? One second user time and one second kernel time? Or something completely different?

roelschroeven · 2023-10-10T11:13:26

It was a dual processor system. The computer I'm using right now has 8 logical processors, and that statistic is incremented by 8 seconds per second: 1 second per logical processor.

justsomehnguy · 2023-10-11T05:52:14

What @roelschroeven says.

CPU Time is a metric of how much 'real time' was spent on the process, so a single thread running 100% time [on a single core with no HT] would give you a full 'real time' second. System idle 'consumes' all the idle CPU time so if the system is doing nothing it's get a second of CPU Time per execution thread. And there are multiple execution threads on almost anything later than 2005. On my T440 with i3-4010U (2 cores, 4 threads) the System Idle consumes ~3 seconds of CPU Time per second, because there are ton of shit running in the background so the system is never 100% idle.

thebruce87m · 2023-10-10T09:17:03

2 cores?

justsomehnguy · 2023-10-10T17:01:01

Given the time it was probably a full blown, single core CPUs.

kevin_nisbet · 2023-10-09T18:34:35

When I worked in telco we used to run into this a lot.

We'd demand standard alarms for things like memory leaks / out of memory conditions / high than normal memory usage, as to get 99.999% uptime we want to be paged when problems like this would occur. Except a bunch of platform did the extremely naive implementation and included recoverable memory in their alarm conditions. So inevitably someone would log in and grep the logs or copy some files to the system, and hit the alarm conditions.

And there were some vendors who really didn't want to fix it, they would argue that recoverable memory is in use, so it should really be part of that alarm condition.

drewg123 · 2023-10-09T18:56:42

I used to run a Linux workstation in the late 00's (sorry FreeBSD folks, I know, the shame...), and I ran a closed source PVR application on it.

The memory access pattern was pretty much pessimal for my use of the box as a workstation. I'd use it from 7am -> 8/9pm every day, then when I'd walk away from the keyboard, I'd watch HD recordings (which could be 7GB or more per hour). Those would get cached in memory, and eventually my workstation stuff (emacs, xterms, firefox, thunderbird) would start to get paged out. In the mornings, it was painful to start using each application, as it waited forever to page in from a spinning disk.

I eventually wrote an LD_PRELOAD for the DVR software that overloaded open, and added O_DIRECT (to tell the kernel not to cache the data). This totally solved my problem, and didn't impact my DVR usage at all.

toast0 · 2023-10-09T22:33:51

> I used to run a Linux workstation in the late 00's (sorry FreeBSD folks, I know, the shame...), and I ran a closed source PVR application on it.

It's ok, no shame. But as I understand it, FreeBSD would prefer to throw out (clean) disk cache pages under memory pressure until somewhere around FreeBSD 11 +/- 1, where there were a few changes that combined to make things like you described likely to happen. Heavy I/O overnight might still have been enough, and I'm not going to test run an old OS version to check ;)

I can't find the changes quickly, but IIRC, older FreeBSD didn't mark anonymous pages as inactive unless there was heavy memory pressure; when there was mild memory pressure, it would go through the page queue(s) and free clean disk pages and skip other page; only taking action on a second pass if the first pass didn't clean enough. This usually meant your program pages would stay in memory, but when you hit memory pressure, there would be a big pause to mark a lot of pages inactive, often too many pages, which would then get faulted back to active...

Current FreeBSD marks pages inactive on a more consistent basis, which is nice because when there is memory pressure, chancws are there's already classified pages. But it can lead to anonymous pages getting swapped out in favor of disk pages as you described; it's all tunable, of course, but it was a kind of weird transition for me. After upgrading the OS, some of my heavy i/o machines saw rising swap usage running the same software as before; took a while to figure that out.

tonyarkles · 2023-10-10T03:30:17

Ran into an issue like this on relatively modern embedded Linux platform. It was a driver bug and it’s been fixed, but here’s the scenario:

- heavy disk access because we were writing real-time images to an SSD at about 500MB/s

- our application was steady-state about 4GB of RAM and we had 32GB available on the platform

- the serial port that we received data from was, under the hood, using DMA

In certain cases, Linux would completely run out of free pages (28GB of it being used for cache on files we were never going to read again). These were all available pages but just occupied at the exact moment. The serial driver would request a page for DMA when it received an interrupt and being inside an interrupt context would request that page with NOBLOCK. That meant that kmalloc would return NULL instead of giving a page, since it would need to evict one of the cache pages before one was available. The serial driver would then blow up and never retry the DMA transaction.

Fun to debug that one!

toast0 · 2023-10-10T04:39:55

DMA pages are fun. Some devices have special needs for DMA buffers, so maybe you've got something ancient that can only use memory under 32-bit, or maybe you have something really ancient that can only use memory under 16 MB; or maybe the disk controller is fine for regular disk access, but administrative commands need to use a limited range. I didn't really finish debugging that one, I got close enough and said well --- we can just measure SMART status a lot less frequently on boxes with that controller and called it a day. :)

avgcorrection · 2023-10-09T20:08:20

Trust me. If Linux really eats your RAM to the point of reaching an OOM state you will know.

(This was of course because of having too many apps relative to my RAM. Not because of disk caching.)

The OOM behavior is not pleasant for a desktop system.

cycomanic · 2023-10-09T23:51:09

And this (the OOM behaviour instead of the paging behaviour on linux) is something that can (and should) be criticised. Every time I encountered a situation where I was running out of memory (usually due to some out of control process) the system would become completely unusable. All interactivity is gone, so it was impossible to kill the out of control process (which was typically misconfigured program i started). If the OOM killer started to take action it would almost never kill the process that was gobling up memory like crazy but instead any of the other apps that are necessary to intervene (like e.g. the terminal or the WM). It always seemed incredibly stupid to me.

I remember some time back there was discussion about improving the OOM killer, but I don't know what came out of it.

Timber-6539 · 2023-10-10T04:07:04

This may or may not preserve your desktop and other important applications in an OOM situation. https://github.com/hakavlad/prelockd

I've heard some good results with it and the applications locked in memory is configurable.

stefan_ · 2023-10-09T21:23:15

Its interesting how much stuff we have in Linux now to make OOM decisions (even userland daemons) yet on every modern distribution it still ends up killing your desktop environment instead of the fricking C++ compiler jobs that caused the problem in the first place.

justsomehnguy · 2023-10-10T00:56:45

    Out of memory: kill process 12345
    Killed process 12345 (sshd)

is the funniest and ugliest message to see on the iLO/VM console.

mnd999 · 2023-10-09T21:48:41

Really broken and stupid would be how I would describe it. Typically is just hangs hard with the disk at 100% and if you’re really patient you might be able to get a shell and kill some things over the course of the next 10 minutes.

ValdikSS · 2023-10-10T04:35:34

It has been fixed by MGLRU patchset since kernel 6.1. Do:

    cat > /etc/tmpfiles.d/mglru-min-ttl.conf <<EOF
    w-      /sys/kernel/mm/lru_gen/enabled          -       -       -       -       y
    w-      /sys/kernel/mm/lru_gen/min_ttl_ms       -       -       -       -       1000
    EOF

and reboot. I've been struggling with this issue, as many others, for years. Now I can run two VMs with 8 GB physical RAM and 5+ GB swapped, and it barely noticeable.

More information, although a bit outdated (pre-MGLRU): https://notes.valdikss.org.ru/linux-for-old-pc-from-2007/en/... Linux issue%3A poor performance under RAM shortage conditions

neurostimulant · 2023-10-09T22:19:33

This is why I stopped having swap on my desktop. I prefer a clean death than prolonged agony.

mnw21cam · 2023-10-09T23:16:27

Having no swap was no panacea, because all of the code sections of your running programs that are memory-mapped in effectively count as "available" clean pages that can be evicted when memory is tight, and they'll cause thrashing just as much as swap would. The solution is to OOM-kill processes before that happens.

neurostimulant · 2023-10-10T10:20:15

Hmm, I personally hasn't experienced any trashing after disabling swap. Instead of the desktop freezing up or heavily lagging for a while until I somehow able to kill some apps to free some memory after 10 minutes struggling to open a terminal, after disabling swap, now it instantly crash back into the login screen when running out of memory.

mardifoufs · 2023-10-09T22:06:22

How does the NT kernel handle OOM situations, compared to Linux? I know it feels a lot smoother and almost like a non problem (it will slow down for a few seconds and get back to normal), but I wonder what goes on behind the scenes and why (if?) Linux has a different approach

hifromwork · 2023-10-09T23:00:47

I don't know the full answer, but on Windows the problem is less significant because of the core memory management decisions that were made.

In Linux you get a ton of copy-on-write memory - every fork() (the most basic way of multiprocessing) creates a new process that shares all of its memory with parent. Only when something is written the child process actually gets "its" memory pages.

To put that into perspective, imaging you have only one process in your system, and it has a big 4GB buffer of rw memory allocated. So far so good. Then you fork() three times - your overall system memory usage is still roughly 4 GB. And now all four processes (parent and 3 children) overwrite that 4GB buffer to random values. Only at this point your system RAM usage spikes to 16GB.

This means, that the thing that actually OOMS may be just "buffer[i] = 1". It's very hard to recover from this situation gracefully, because this is an exceptional situation, and exceptional situations may require more allocations which are already impossible. Now compare that to Windows, where most memory allocations are in predictable moments, like when malloc() is called, and failures can be safely handled at that point.

So, in the ideal situation, Windows running out of memory will just stop giving new memory to processes and every malloc will fail. In Linux it's not an option, since every write to a memory location can suddenly cause allocation due to copy on write.

the8472 · 2023-10-09T23:41:15

Which can lead to dozens of unrelated applications dying on windows when they assume infallible allocators while linux keeps going (sluggishly) until it has to kill just the biggest one.

Espressosaurus · 2023-10-10T07:54:39

I've worked on memory constrained Windows VMs. The problem shows up as the application you're on dying, because guess what, you're trying to allocate memory that isn't there.

The rest of the system is still usable.

It's fine.

For the longest time I also ran with no swap on Windows (and just an excessive amount of memory). I'd notice when I'd run out of memory when a particularly hungry application like Affinity Photo died and I had a zillion browser tabs open, but again, the system is perfectly responsive and fine.

The Windows behavior seems much closer to deterministic and much more sane than the OOM killer of Linux.

the8472 · 2023-10-10T12:31:49

I've had important background processes die on windows when the offender didn't die and the OOM situation persisted for some time - I assume because it was using fallible allocations while the other processes weren't.

ValdikSS · 2023-10-10T04:39:26

Linux swapper used to be very aggressive on file cache, evicting it to the point that for the next second you'll need all of these libraries again. That is the main reason of the slowdowns.

Fortunately now we have MGLRU patchset, which "freezes" the active file cache for a desired amount of milliseconds, and in general is much smarter algo.

patrakov · 2023-10-10T11:55:54

This may be applicable for desktops, but not for servers.

In a low-memory situation, the admin wants to ssh into the server and fix the problem that led into memory exhaustion in the first place. Whoops, MGLRU freezes the active file cache only, which includes the memory hog, but does not include sshd, bash, PAM, and other files that are normally unused when nobody is logged in, but become essential during an admin intervention. So, de facto, the admin still cannot login, and the server is effectively inaccessible. The only difference is that the production application is still responding, which is not so helpful for restarting it.

mnw21cam · 2023-10-09T23:14:25

The main problem with Linux OOM behaviour is exactly because of what counts as "available" memory. In essence, when the system is really low on memory, it will evict all the pages that are "available", which includes all those pages that are clean and can be loaded in from disc, which of course includes all the memory-mapped code segments in all of your running software. Because of that, this makes the system really run at a crawl because every little bit of progress involves loading in a page of code before running it. Recent versions are a lot better, but certainly ten years ago on systems with a very large amount of memory this could cause the system to become basically completely unresponsive. The solution was to get the OOM killer to start taking action a lot earlier, so that it never reached the point of being so low on memory that it would thrash like that. There is a program called early_oom that helped with that.

the8472 · 2023-10-09T23:28:30

Over the last few years there has been ongoing work to improve this. Including improved pressure detection, multi-generational LRU, large huge page swap and a bunch of other things. Some aren't enabled by default, some need userspace daemons to make use of them.

So out-of-the-box experience of some random distro is not necessarily the best you can get, especially on older kernels.

foresto · 2023-10-10T03:45:45

> there has been ongoing work to improve this. Including improved pressure detection,

Are you referring to the /proc/pressure interface?

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

the8472 · 2023-10-10T12:34:10

Yes, the pressure information is used by userspace OOM killers but afaik also internally to detect whether progress has been made on reclaiming memory and it's supposed to be better than the previous progress heuristic.

swagmoney1606 · 2023-10-10T16:39:57

Same with disk!

I'm running Arch with i3wm. I didn't get ANY notification or helpful error message when I ran out! Instead, somehow, ghcup installed a corrupted version of cabal, that would segfault every time it was invoked. That was my only hint at first. I eventually ran df -h and discovered what was going on but man...

berkes · 2023-10-09T20:58:38

I've never encountered this with "many apps" starting to OOM, but many times with one process OOMing. That one will simply crash and everything else continues to run unharmed.

mbakke · 2023-10-09T21:43:56

What distribution are you using?

IME, if a process grows out of control, Linux won't notice until the whole system is thrashing, at which point it's too late and it tries killing random things like browser tabs way before the offending process.

In rare cases Linux might recover, but only because I hammered C-c the right place 30 minutes ago. In most cases a hard reboot is required (I left it overnight once, fans spinning at max, hoping the kernel would eventually accept my plea for help, but she had other priorities).

jltsiren · 2023-10-09T22:31:22

I guess OOM is more problematic on low-memory systems or when you have more than a nominal amount of swap.

If you have enough memory that the desktop environment, browsers, and other processes that should keep running only use a small fraction of it, the OOM killer can pick a reasonable target reliably. A process that tries to allocate too much memory gets killed, and everything is robust and deterministic. I sometimes trigger OOM several times in an hour, for example when trying to find reasonable computational parameters for something.

mnw21cam · 2023-10-09T23:19:51

On the contrary, it's worse on systems with lots of memory, because those are the systems that are trying to do more.

About 8 years ago I got a work machine with 384GB of RAM, and I installed early_oom on it to make the OOM killer work a whole load earlier, otherwise the system would just become completely unresponsive for hours if one of my students/colleagues accidentally ran something that would make it run out of RAM.

temac · 2023-10-09T23:53:51

Linux provably got better because i've got a 200GB multi user machine where oom kills (on a stock debian 12) are largely uneventful.

Sakos · 2023-10-09T22:49:31

How much memory do you think is reasonable? I've had it happen to me with 16GB and even 32GB, where I never ever have this issue on Windows (unless for some reason I'm on a 2GB RAM system for God knows why). I wish people would stop defending pathological behavior that's broken for standard desktop use. What's wrong with wanting things to improve?

jltsiren · 2023-10-09T23:20:14

Nobody was defending anything. I just told that I don't remember having any issues with the Linux OOM killer, and guessed a potential reason.

I haven't really used any Windows version later than 2000 for anything except gaming, so I don't know how things work there these days. I mostly use macOS and Linux, and I've had far more trouble with pathological memory management behavior in macOS. Basically, macOS lets individual processes allocate and use far more memory than is physically available. When I'm running something with unpredictable memory requirements, I have to babysit the computer and kill the process manually if necessary, or the system may become slow and poorly responsive.

berkes · 2023-10-10T07:56:37

Ubuntu.

I'm guessing you are referring to "swapping", though?

If it's just one user process, it'll be killed by the OOM killer¹. That application will just be gone: poof. And for the rest you'll probably not notice anything, not even a hiccup in your Bluetooth headphones.

If it's many services, or services that are excempt from that killer, your system might start swapping. Which, indeed, leads to the case you describe.

¹https://unix.stackexchange.com/questions/153585/how-does-the...

julienpalard · 2023-10-09T18:08:57

My advice in the situation when someone wants to "free RAM": "You bought it, better use it."

It always felt strange that people buy lots of RAM but want it to be kept unused...

jabroni_salad · 2023-10-09T18:53:23

Back when I played WoW I would occasionally run into issues with windows trying to put the game into memory compression, as opposed to basically any other process. It turned the game into a powerpoint.

You could either get freaky with process explorer, or just keep some overhead so the system wouldn't try to do that. When I asked my guildies, they told me the 'default' for gaming is 16GB now, I was on 8 at the time.

Pretty much every gamer will at some point tab out to process manager to see wtf the computer is trying to do and exactly zero of them will think to themselves "I'm so glad there is no wasted memory!"

spookie · 2023-10-09T20:26:18

For the 3rd paragraph, specifically: That's a fault with Windows not being clear enough with what is actually being in use, and what may be used and already there for a myriad of reasons.

(edit: specified my intent on the reply)

jstarfish · 2023-10-09T19:00:12

It's there for when you need to do something requiring that much memory.

Your approach is like buying a giant house, becoming a hoarder, and trying to throw a party.

OrderlyTiamat · 2023-10-10T07:21:11

> Your approach is like buying a giant house, becoming a hoarder, and trying to throw a party.

exactly, excepting that the items they're hoarding are occasionally very useful for making their day to day activities go faster. And the hoarder has the superpower that in the blink of an eye they can discard everything that's hoarded to make room for the party.

Wait it isn't quite like a normal hoarder at all come to think of it!

outworlder · 2023-10-09T18:18:19

The issue is that they think they are reaching their system's capacity.

freedomben · 2023-10-09T18:17:04

Well, usually you want to free it so you can use it for something else without hitting swap. At least that's my use case

lstodd · 2023-10-09T19:06:06

The whole point is that pagecache does not cause any swap hits.

Oh my god, it's 2023 and we're still discussing this idea from 1970s.

Is that so hard to grasp? No, stuff gets evicted from the cache long before you hit the swap, which is by the way measured by page swap-out/in rate and not by how much swap space is used, which is by itself a totally useless metric.

Dylan16807 · 2023-10-09T21:04:12

> stuff gets evicted from the cache long before you hit the swap

No...?

I'm looking at a machine right now that has 3.7GB not swapped out and 1.2GB swapped out. Meanwhile the page cache has 26GB in it.

Swapping can happen regardless of how big your page cache is. And how much you want swap to be used depends on use case. Sometimes you want to tune it up or down. In general the system will be good about dropping cache first, but it's not a guarantee.

> measured by page swap-out/in rate and not by how much swap space is used

Eh? I mean, the data got there somewhere. The rate was nonzero at some point despite a huge page cache.

And usually when you want to specifically talk about the swap-out/in rate being too high, the term is "thrashing".

fsckboy · 2023-10-10T00:57:51

cached disk pages are not going to be swapped out, they're just freed (because these pages are already "out" in the same place swapland is)

if your cached disk pages keep getting hit and are "recent", they're going to stay in, and your old untouched working pages are going to be swapped out, to make room either for new working pages because you've just loaded new programs or data, or to make room for more disk pages to be cached because your page cache accesses are "busier" than some of your working pages.

you will swap out pages to make room for disk cache, but your cached disk pages will never be swapped out, they are just tossed (of course, after any dirty pages are written)

freedomben · 2023-10-10T17:23:31

> The whole point is that pagecache does not cause any swap hits.

> Oh my god, it's 2023 and we're still discussing this idea from 1970s.

> Is that so hard to grasp? No, stuff gets evicted from the cache long before you hit the swap, which is by the way measured by page swap-out/in rate and not by how much swap space is used, which is by itself a totally useless metric.

Not everyone has been alive and into this stuff since the 1970s. That you and I know about this is irrelevant for the new people discovering it for the first time. There is always going to be a constant trickle in from new sources, for as long as it takes for the tech to go away. See relevant xkcd: https://xkcd.com/1053/

But it's also worth pointing out that RAM/swap/page cache isn't always as simple as page cache out, RAM in. For example, this question[1] seems to indicate that things aren't as simple as you suggest.

[1]: https://unix.stackexchange.com/questions/756990/why-does-my-...

lstodd · 2023-10-12T00:55:43

The page cache mechanism is very much alive. That's what we were discussing, is it not? I only lamented the fact that over 50 years the basics of how it works should have become common knowledge but did not.

As for the link you provided, I do think I can get a system in a state like that, and that isn't even untrivial. To push firefox into swap, esp if you have just 8 gigs of it is.. simple. But it is not in any way a normal state of a system. Idk how the author got it in that state.

sznio · 2023-10-10T07:48:31

I just got a 64GB machine. It rarely sees much use, but I did go over 32GB a few times.

Could've done away with less but I still have PTSD from all my applications crashing after I started Teams on my 16GB machine. On another note: Upgrading from i5-2500k to R7-5800X doesn't make Teams faster in any way.

throwaway092323 · 2023-10-09T23:20:20

Just because it's there, doesn't mean I want the same programs to use more.

flashback2199 · 2023-10-09T18:17:49

RAM isn't user friendly in Linux. Ubuntu Desktop is the most popular distro by far by Google Trends, but it doesn't even come with RAM compression set up out of the box, so as soon as you run out of memory the UI totally locks up until the task killer kills a process, which always takes minutes in my experience. Pop OS does come with RAM compression set up, which is Ubuntu based, but then you're stuck on xorg instead of Wayland right now, because they decided to make their own DE from scratch in Rust for some strange reason, which isn't available yet. You can set up RAM compression yourself, but when macOS and Windows both have it standard, coming to Linux as a newbie so you install Ubuntu Desktop and your whole system locks up as soon as you run out of physical RAM, it's really odd and unexpected. I'm not even sure who would want to run a desktop distro without RAM compression.

outworlder · 2023-10-09T18:36:27

RAM compression is not magic.

It does allow you to save RAM and might prevent you from hitting swap for a while longer, but it won't save you if your working set is just too large and/or difficult to compress. Apps like web browsers with multiple tabs open might be easier to compress, a game with multiple different assets that are already in a variety of compressed formats, less so.

The Linux Kernel also has a bunch of optimizations (Kernel same-page merging, for example, among others) that do not require compression(although you could argue that same-page merging _is_ a form of compression).

The system is not supposed to 'lock up' when you run out of physical RAM. If it does, something is wrong. It might become slower as pages are flushed to disk but it shouldn't be terrible unless you are really constrained and thrashing. If the Kernel still can't allocate memory, you should expect the OOMKiller to start removing processes. It should not just 'lock up'. Something is wrong.

> which always takes minutes in my experience

It should not take minutes. Should happen really quickly once thresholds are reached and allocations are attempted. What is probably happening is that the system has not run out of memory just yet but it is very close and is busy thrashing the swap. If this is happening frequently you may need to adjust your settings (vm.overcommit, vm.admin_reserve_kbytes, etc). Or even deploy something like EarlyOOM (https://github.com/rfjakob/earlyoom). Or you might just need more RAM, honestly.

I have always found Linux to behave far more gracefully than Windows (OSX is debatable) in low memory conditions, and relatively easy to tune. Windows is a swapping psycho and there's little you can do. OSX mostly does the right thing, until it doesn't.

jowea · 2023-10-09T20:03:59

> The system is not supposed to 'lock up' when you run out of physical RAM. If it does, something is wrong. It might become slower as pages are flushed to disk but it shouldn't be terrible unless you are really constrained and thrashing. If the Kernel still can't allocate memory, you should expect the OOMKiller to start removing processes. It should not just 'lock up'. Something is wrong.

I don't why but locking up is my usual experience for Desktop Linux for many years and distros, and I remember seeing at least one article explaining why. The only real solution is calling the OOMKiller early either with a daemon or SysRq.

> It should not take minutes. Should happen really quickly once thresholds are reached and allocations are attempted. What is probably happening is that the system has not run out of memory just yet but it is very close and is busy thrashing the swap. If this is happening frequently you may need to adjust your settings (vm.overcommit, vm.admin_reserve_kbytes, etc). Or even deploy something like EarlyOOM (https://github.com/rfjakob/earlyoom). Or you might just need more RAM, honestly.

Yeah. Exactly. But as the thread says, why aren't those things set up automatically?

schemescape · 2023-10-09T20:38:49

As an additional data point, my usual OOM experience on Linux is also a completely frozen system until I get frustrated enough to power cycle the machine.

Has anyone transitioned from this being their observed behavior to something more tolerable? What did you change to avoid this problem?

cycomanic · 2023-10-09T23:54:44

Same here, for me this has been the most annoying issue when running Linux (much less now as I have much more RAM so I don't encounter the issue).

MrDrMcCoy · 2023-10-09T18:51:13

Same page merging only works for KVM, as that's the only case that enables it without intervention that nothing else supports. It's MADVISE for everything non-KVM, and no applications are compiled with support for telling the kernel "hey, it's OK to dedupe me". The only way to get KSM to work with userspace applications is to use LD_PRELOAD to inject the necessary bits (https://github.com/unbrice/ksm_preload) or to use a custom kernel that has a patch and extra daemon to globally enable KSM for everything (https://codeberg.org/pf-kernel/uksmd).

I really wish this was a standard, configurable sysctl. There are many container environments (and heck, even browsers) that would benefit from this, and I cannot see any real downside.

flashback2199 · 2023-10-09T18:39:29

Didn't say it was magic. System slows down more as you use more RAM compression, so you have time to respond and close some apps. Without it I find I often am working at a thousand miles an hour, not noticing anything amiss, and then suddenly, brick wall, out of memory and I can't do anything at all.

mhitza · 2023-10-09T21:08:16

OOMKiller jumps into action pretty late. I'm on Fedora, thus running the systemd-oomd service, but even with this new service the system will lock up for a minute or two before the greedy process is killed.

I think with modern browsers, on memory constrained systems (think 4GB of RAM) this is easier to encounter than in the past. As someone who programs in Haskell from time to time I think I'm more familiar with Linux OOM behavior than most.

If someone wants to experience this easily with Haskell just run the following in ghci

    foldl (+) 1 [1..]

sznio · 2023-10-10T07:58:29

>The system is not supposed to 'lock up' when you run out of physical RAM. If it does, something is wrong.

I've never seen a Linux system not lock up on OOM. At work or at home the instant it starts swapping you might as well restart. Of course it has to kill the SSH daemon rather than the process using 98% of the memory.

>I have always found Linux to behave far more gracefully than Windows

Windows just gets sluggish for a few seconds. You can even still move the cursor when that happens!

astrange · 2023-10-10T02:18:38

> Apps like web browsers with multiple tabs open might be easier to compress

Unfortunately harder than it looks; if you compress the JS heap the garbage collector may decompress it again when scanning for references.

ihattendorf · 2023-10-09T18:25:22

I don't see how RAM compression helps address the machine locking up when it'll still lock up when the compressed RAM is used up. It just buys you a little more time.

Also, Fedora has had zram enabled by default for a few years now along with systemd-oomd (which can sometimes be too aggressive at killing processes in its default configuration, but is configurable).

mxmlnkn · 2023-10-09T18:36:10

systemd-oomd is also default since Ubuntu 22.04. I remember it vividly because it effectively kept killing X when RAM filled up instead of sanely killing the process that last filled up the RAM, which is either gcc or firefox in my case. Absolutely user-unfriendly default configuration. I removed it and reinstalled earlyoom, which I have been using for years with a suitable configuration. I can only concur, RAM behavior isn't user-friendly on Ubuntu.

pxtail · 2023-10-09T20:03:34

Thank you for mentioning earlyoom - I'll install and try it because current behavior of total, complete lockup without ability to do anything besides reset with the hardware button infuriates me unbelievably. I really don't comprehend how something like this is possible and default behavior in 2023 in OS marketed as 'desktop' and 'casual/user friendly'

mhitza · 2023-10-09T21:10:28

Had the same experience in the past with systemd-oomd, nowadays it does a better job at killing greedy processes than the entire user slice/scope.

konstantinua00 · 2023-10-09T19:05:51

I second the earlyoom recomendation

it's a lifesaver

khimaros · 2023-10-09T19:39:15

personally, i run my systems without swap, and kernel OOM behavior has been adequate.

flashback2199 · 2023-10-09T18:37:32

Because it slows down as you use more RAM compression, so you have time to respond and close some apps. Without it you are working at a thousand miles an hour and then suddenly, brick wall.

bityard · 2023-10-09T19:36:48

Ah yes, the old, "you should enable swap so that when your RAM fills up, you know about it when the disk starts thrashing and all I/O grinds to a near-halt."

I mean, swap is useful, but that's not what it's for. Same is true for compressed RAM. If you want an alert for low available RAM, it seems like it would be better to write a script for that.

flashback2199 · 2023-10-09T21:51:29

> disk starts thrashing and all I/O grinds to a near-halt

Nope, neither of those things happen when zram starts compressing ram. Nothing grinds to a near halt until the compressed RAM space is used up, it just slows down a little bit. Btw, compressed RAM via zram isnt swap, it's available as actual ram. It also increases the total amount of ram available. I don't think I need to make arguments in favor of ram compression since Windows and macOS both have ram compression by default.

Karellen · 2023-10-09T18:46:57

> Because it slows down as you use more RAM compression,

Wait, are you claiming RAM compression uses an adaptive compression factor that compresses more as memory pressure grows?

Are you sure that's how it works?

flashback2199 · 2023-10-09T19:00:20

In the case of zram, it reserves a portion of the physical RAM, and when the remaining physical RAM portion runs out, it begins compressing ram into the reserved portion. So the system slows down a bit as this compression starts happening. Nothing really adaptive about it to my knowledge but the result to the user is a noticeable slow down when there is high ram usage, which is a heads-up to me to close some stuff. Without it the system just locks up as soon as physical RAM is exhausted, without any warning, since it's fast up until that moment. Hope this makes sense. I'm not an expert on zram or other Linux RAM compression packages, so can't really answer questions about it beyond that.

IshKebab · 2023-10-09T18:29:31

Yeah you'd think it would make no difference but in my experience it does help a little. Don't ask me why.

But yeah even with zram my laptop still hard reboots 80% of the time when it runs out of RAM. No idea how people expect the Linux Desktop to ever be popular when it can't even get a basic thing like not randomly rebooting your computer right.

yjftsjthsd-h · 2023-10-09T18:25:04

I'm pretty sure what you're calling RAM compression is swapping to zram, in which case the answer is that Some people prefer to not swap at all because that will still make things janky in comparison to just killing things when you're out of memory. (I would endorse earlyoom for that)

vlovich123 · 2023-10-09T18:27:42

I’ve heard this position multiple times, and yet every single benchmark I’ve seen repeated by teams of engineers in multiple contexts fails to replicate this fear. Zswap really is something that should just always be enabled.

dmacvicar · 2023-10-09T20:27:33

For me it solved most of these lockups when using heavy ram apps (Electron, Firefox + Teams, etc) and keeps the system responsive. I am happy with it and plan to keep it enabled. I have no data to validate except that I don't remember having to SysRq key + F some app for a long time.

yjftsjthsd-h · 2023-10-09T18:29:01

How would you benchmark that?

vlovich123 · 2023-10-09T19:43:44

For example, at Oculus they ran both performance benchmarks in a lab and collected feedback data from field telemetry. Now of course, it’s always possible some performance counter was overlooked / the degradation requires a specific workload to show, but the lack of ability to show any evidence of a difference implies that you probably are unlikely to see it given that the detractors were very vocal and engineering at big corps tends to be more stasis driven.

I saw this also repeated at Apple (not Zswap since not Linux, but similar idea of compressing pages) and Android.

flashback2199 · 2023-10-09T18:41:06

My point was that as a new user the default experience is unfriendly and saying that I have to understand the nuance between different ram related packages in order to talk about it is just proving my point.

yjftsjthsd-h · 2023-10-09T20:06:44

I'm not saying that a new user should need to understand the nuance, I'm questioning whether your understanding of the underlying problem is accurate. I do agree that it's a poor experience for the system to freeze up under excess memory pressure, I just think the correct fix is less swap combined with earlyoom.

flashback2199 · 2023-10-10T03:31:15

Gah I am so tired of explaining this in this thread: As the system begins running out of memory, it starts using more of the zram. The zram is compressed which uses CPU and slows the system down enough to notice it during which time I notice and begin closing apps. The alternative, without zram, is it's super fast right until I run out of memory then bam my whole system locks up. Zram also effectively makes the total available ram larger because zram swap is actually useable whereas swap to disk is so slow the system basically locks up when you start depending on it as if it were ram. Just try it dammit! It takes a few mins to set up and open enough stuff to see the effects.

MrDrMcCoy · 2023-10-09T18:54:32

In addition to swap on zram, there's also zswap. zswap is not quite as good as swap on zram, but almost certainly is better suited to systems that you want to have be able to hibernate.

olddustytrail · 2023-10-09T18:29:11

As the other comment says (but kind of hides) install earlyoom and point the config at whatever you reckon is the main culprit. It only needs done once and you can forget about it.

Edit: I should add, this is advice for desktops. If it's a server either resize or fix your service.

__turbobrew__ · 2023-10-09T18:49:27

> Disk cache can always be given back to applications immediately

This is not true, there is a cost to freeing the cache pages and allocating them to the other program. I have seen some very regressive performance patterns around pages getting thrashed back and forth between programs and the page cache, especially in containers which are memory limited. You throw memory maps into the mix and things can get really bad really fast.

mnw21cam · 2023-10-09T23:21:52

That's why Linux does keep a certain amount of RAM actually-free, so that it can hand over some pages immediately. If the amount of actually-free RAM goes below a certain amount, then it will pre-emptively evict a load of cache pages.

blueflow · 2023-10-09T19:55:30

Can you elaborate further?

__turbobrew__ · 2023-10-13T06:41:14

There is a fast path and slow path to reclaiming cached pages. There are ways to measure the impact using ebpf probes.

If you are interested in going deeper I recommend looking at the memory section of this book: https://www.brendangregg.com/blog/2020-07-15/systems-perform...

skazazes · 2023-10-09T18:08:21

Is this the reason Windows Task Manager seems to show Vmmem (WSL2) as gobbling up well more RAM then WSL seems to indicate is in use?

I have more then enough RAM on my office workstation to just accept this, but on my personal gaming computer that moonlights as a dev machine, I run into issues and have to kill WSL from time to time.

praash · 2023-10-09T18:20:02

That's just one part of the issue - even after forcefully dropping Linux's caches, WSL has been unable to reclaim the memory back reliably. There has been a recent update that claims to finally fix this.

You might find this package helpful: https://github.com/arkane-systems/wsl-drop-cache

Dylan16807 · 2023-10-09T20:56:27

It's also really annoying that "drop caches" seems to be the only interface here. No way to simply limit it.

sp332 · 2023-10-09T18:15:27

I think there is some conflict between the disk cache running inside WSL and the memory management outside. I tried turning up memory pressure in WSL but it didn't help. This does work but I have to run it manually from time to time:

  # sync; echo 3 > /proc/sys/vm/drop_caches

chabad360 · 2023-10-09T18:15:24

No, that's because WSL (until v2/very recently) didn't properly release memory back to windows. This actually would cause docker to effectively leak memory really quickly.

Zetobal · 2023-10-09T18:34:13

The worst offense of wsl2 is writing files to ram before copying it to the native filesystem unusable with lots of data.

loktarogar · 2023-10-09T18:57:24

This feels like a UX problem. If this is a normal and expected part of linux operation, it should be called out in the {T,G}UI.

bityard · 2023-10-09T19:23:55

1. You can't change `free` output, you'll break SO many scripts.

2. Most things which report memory usage in a user-friendly way _already_ do this in an obvious way. (Htop shows disk cache in a bar graph, but doesn't add it to the "used" counter.)

3. Should UX always compensate for some fraction of users' misunderstanding of how their OS kernel works? Or would it be better for them to ask the question and then be educated by the answer?

loktarogar · 2023-10-09T19:44:35

> Or would it be better for them to ask the question and then be educated by the answer?

Good UX makes the question "why is linux using my unused RAM for disk caching" (a non pressing question) instead of "why is linux eating up all my RAM" (panic, stressful question)

gruez · 2023-10-09T19:16:06

It is. Windows does the same thing, but it's a non-issue because task manager makes it look like cached memory is free memory.

the8472 · 2023-10-09T20:06:00

But it does. https://files.catbox.moe/l9je82.png orange is the part used by caches

loktarogar · 2023-10-09T23:05:03

I'm not a linux user, so not an observation of experience. Just the existence of this website suggests to me that however it is being done right now could be made clearer somehow.

nightfly · 2023-10-09T19:14:26

htop shows this

therealmarv · 2023-10-09T18:32:20

Windows and Mac use compressed RAM for many many years as standard.

Yet on many Linux desktop you have to activate it (namely ZRAM). It solves the problem that a e.g. browser eats all your memory. It's much quicker than Swap and yet mostly unknown by many people who are running a Linux desktop. As mentioned by another user it's still not standard on Ubuntu desktop and I don't understand why.

viraptor · 2023-10-09T18:45:36

> It solves the problem that a e.g. browser eats all your memory.

It doesn't solve that. You get a little bit more headroom, but that's it. Not much ram is considered compressible anyway. On my Mac I'm barely reaching 10% of compressed memory anyway, so it doesn't make that much difference.

noisem4ker · 2023-10-09T18:54:56

Current RAM usage from my Windows 10 dev machine, as reported by Task Manager:

> In use: 18028 MB

> In use, compressed: 2718 MB

> Compressed memory stores an estimated 9013 MB of data, saving the system 6294 MB of memory.

That's not a small amount.

jowea · 2023-10-09T20:01:33

My experience is that zram saves quite a bit. I have another old laptop with 4GB where it's essential. Maybe it differs by program type?

  NAME       ALGORITHM DISKSIZE  DATA  COMPR TOTAL STREAMS MOUNTPOINT
  /dev/zram0 lzo-rle      15,6G  1,9G 248,6M  418M      16 [SWAP]

viraptor · 2023-10-09T20:46:41

Yup, it will depend on your workload a lot. Worth testing of course!

ltbarcly3 · 2023-10-09T21:16:29

> Not much ram is considered compressible anyway.

What are you basing this on? Things in RAM are often very very compressible, usually between 3:1 and 4:1.

viraptor · 2023-10-09T22:50:11

Depends on what things are in your ram. Code/configuration/simple data structures compress nicely. Images/videos/ML-models don't.

ltbarcly3 · 2023-10-09T23:56:20

> Depends on what things are in your ram.

No offense, but you are being a very precise in defense but very broad in your (in general incorrect) claim.

The representation of an image sitting in memory will be a bitmap array, and for sure that will compress quite well. Video data as well but any decompressed frames are so transient I agree they won't benefit. ML-models don't compress well, but training data certainly does.

If you put aside mapping already compressed or non-compressable data into memory, all the rest of the things ram is used for can be compressed. Day to day you will have a lot of memory allocated that can be compressed. Most memory in use right now on most computers is compressible.

therealmarv · 2023-10-10T14:32:53

yes, it's not the ultimate solution but if your machine behaves in 98% of cases snappy and does not slow down vs. let's say only 70% of cases then that's a huge usability difference. Sometimes you just need a little headroom on top and not more. As some user point out it really depends on the workload and data etc. Browsers are a good example because in my experience browser cache can be compressed quite good.

dmacvicar · 2023-10-09T20:23:54

One can also use zswap: https://docs.kernel.org/admin-guide/mm/zswap.html https://wiki.archlinux.org/title/Zswap

which I find easier to setup. Just enable it and it manages itself. You can still keep swap on disk, but it will act as a buffer in between, trading CPU cycles for potentially reduced swap I/O.

I think Arch has it enabled by default, but I am not sure about that. I had to enable it manually on Tumbleweed because my rolling install is years old.

bandrami · 2023-10-09T23:06:36

Fedora does that on some (but not all) disk/ram size combos. IIRC the installer won't put swap on nvme unless you tell it to explicitly, and will always set up the smaller of 4g or half of physical memory as zram.

spookie · 2023-10-09T18:36:50

There are distributions that enable it by default, Fedora comes to mind.

bee_rider · 2023-10-09T18:51:05

Compressing data has a cost, right? Modern systems have a ridiculous amount of memory, if you are bumping into that limitation, it seems like something odd is happening.

If your web browser is using all your ram, it is probably misconfigured, maybe the ad-blocker has accidentally been turned off or something?

undersuit · 2023-10-09T20:45:08

I run a Linux system with 2GB of RAM... and Intel integrated graphics, it's storage is not exceptionally fast flash. The more pages I can keep compressed in RAM, the less the CPU has to spend waiting on the storage, especially if we're talking about the swap partition. After letting that computer run a long time I can tell whats been swapped to disk versus just compressed to zswap.

colinsane · 2023-10-09T20:50:16

> Modern systems have a ridiculous amount of memory

well it depends on your definition of modern, i suppose. i run Linux on a smartphone, which is about the most modern use of Linux i can think of, and hitting that 3-4 GB RAM limit is all too easy with anything touching the web, adblocker or not.

zram isn't exactly a trump card in that kind of environment, but it certainly makes the experience of saturating the RAM a lot nicer ("hm, this application's about half as responsive as it usually is. checks ram. oh, better close some apps/tabs i don't need." -- versus the default of the system locking for a full minute until the OOMkiller finishes reaping everything under the sun).

bee_rider · 2023-10-10T03:24:15

How strange, I guess we must use different websites or something.

HippoBaro · 2023-10-09T21:09:18

I think the information there is valuable because questions about memory usage in Linux keep coming up. The answer: "don't worry about it," is probably a good starting point. The page claims things that are just really misleading, though.

> There are no downsides, except for confusing newbies.

False. Populating the page cache involves lots of memory copies. It pays off if what's written is read back many times; otherwise, it's a net loss. It also costs cycles and memory to keep track of all these pages and maintain usage statistics so we know what page should be kept and which can be discarded. Unfortunately, Linux makes quantifying that cost hard, so it is not well understood.

> You can't disable disk caching. The only reason anyone ever wants to disable disk caching is because they think it takes memory away from their applications, which it doesn't!

People do want that, and they do turn it off. It's probably the number one thing database people do because they want domain-specific caching in userland and use O_DIRECT to bypass the kernel caches altogether. If you don't, you end up caching things twice, which is efficient/redundant.

bminor13 · 2023-10-09T18:29:45

Does anyone happen to have expertise/pointers on how ZFS' ARC interacts with Linux disk caching currently when using ZFS-on-Linux? It seems like the ARC space shows up as "used" despite being in a similar category of "made available if needed" - is that correct?

Is data in the ARC double-cached by Linux's disk caching mentioned in the post? If so, is it possible to disable this double-caching somehow?

MrDrMcCoy · 2023-10-09T18:59:32

ZFS ARC unfortunately does not integrate with the kernel file cache, so they step on each other a lot. ZFS does watch available system RAM and try to dynamically reduce its usage as memory pressure increases, but I've found its responsiveness for this to be far too slow. This combined with how ARC appears to just be an opaque block of RAM that cannot be reclaimed, I usually just set a hard limit on how big the ARC is allowed to get in the module load arguments and be done with it (at least for systems that are doing more than just storage).

drewg123 · 2023-10-09T20:00:55

Is ARC really non-reclaimable on Linux?

At least on FreeBSD, there is a kmem_cache_reap() that is called from the core kernel VM system's low memory handlers.

Looking at the linux code in openzfs, it looks like there is an "spl_kmem_cache_reap_now()" function. Maybe the problem is the kernel dev's anti-ZFS stance, and it can't be hooked into the right place (eg, the kernel's VM low memory handling code)?

MrDrMcCoy · 2023-10-09T21:47:40

It's reclaimable, but opaque. The ARC just looks like used RAM rather than file cache, which throws off various means of accounting.

nubinetwork · 2023-10-10T07:05:39

echo 3 > /proc/sys/vm/drop_caches

(Bear in mind that 3 is the most aggressive but other than exporting the pool, it's the only way to dump the cache, especially if you boot off ZFS)

nubinetwork · 2023-10-10T07:03:17

ARC is completely separate from FS caches... if the kernel needs memory, it will tell ZFS to prune the ARC, however it's not exactly instantaneous.

Newer versions of htop also now have counters for ARC usage (compressed or uncompressed)... but it still shows up as used rather than cache.

lnyng · 2023-10-10T00:59:41

We published this paper "TMO: Transparent Memory Offloading in Datacenters" last year which covers some Linux memory management mechanisms that may be quite useful for providing reasonable estimations to application memory usage.

We observed that the real memory footprint for applications depends on many factors: file access pattern, disk IO speed (especially if swap is enabled), ssd vs hdd, application latency sensitivity, etc. Instead of coming up with some overly complicated heuristic, we use the Linux kernel provided memory.pressure [0] metric via cgroup v2. It measures the amount of time spent waiting for memory (page fault etc). Then by slowly reclaiming memory from the application until its memory pressure hits some target (say 0.1%), we can claim that the steady state usage is the actual memory footprint.

This may not be useful for PC but could be very useful for data center to track memory regression, and also to harvest disk swap without concerning too much about the cliff effect when the host runs out of memory and suddenly kernel pushes everything to swap space.

[0] https://facebookmicrosites.github.io/cgroup2/docs/pressure-m...

neurostimulant · 2023-10-09T19:19:07

> If your applications want more memory, they just take back a chunk that the disk cache borrowed. Disk cache can always be given back to applications immediately! You are not low on ram!

I'm running RKE2 on my desktop and it'll start killing pods due to low memory pressure, even though the memory was only used for disk caching. I wonder if there is any way to make it stop doing that and instead only start killing pods if it's due to "real" low memory pressure.

blueflow · 2023-10-09T19:58:39

Think about it: A processes executable code comes from a file. You will need the size of the executable available as disk cache or the program execution will cause heavy thrashing and I/O. So some part of it is "real" memory pressure.

neurostimulant · 2023-10-09T22:01:28

I also run an nfs server in the same machine, so after a period of heavy nfs use, most of the ram were eaten by the disk cache and rke2/kubernetes start having memory pressure taint. After a fresh restart with all pods running, the memory usage is below 10%, so I doubt the disk cache was full with executable files cache.

okwhateverdude · 2023-10-09T20:07:51

Assuming Linux, oddly enough I came across this exact behavior[0] while researching resource management for an on-prem k8s cluster. Take a look at that thread for more info, but TL;DR, you need to actually finesse page cache constraints if you want avoid the behavior. You actually can have really fine grained control over page cache via cgroups v2[1][2] and systemd[3].

[0]: https://github.com/kubernetes/kubernetes/issues/43916

[1]: https://docs.kernel.org/admin-guide/cgroup-v2.html#memory-in...

[2]: https://biriukov.dev/docs/page-cache/6-cgroup-v2-and-page-ca...

[3]: https://www.freedesktop.org/software/systemd/man/systemd.res...

neurostimulant · 2023-10-09T22:32:04

Thank you for the pointers. That's a lot of things to learn since I never look into cgroup before. I'll see if there is something better there than my current "fix" (periodically run `echo 1 > /proc/sys/vm/drop_caches`).

GuB-42 · 2023-10-09T21:09:58

I remember that being the case for early versions of Android, people were surprised all their RAM was used, and of course, we could find apps that "freed" the RAM, generally making things worse.

And the response was similar: all that "used" RAM can be reclaimed at any time should an app need some, but in the meantime, the system (which is Linux) might as well use it.

I think they "fixed" it in later versions. I don't know how, but I suspect they just changed the UI to stop people from complaining and downloading counterproductive apps.

As usual in these situations, unless you really know what you are doing, let the system do its job, some of the best engineers with good knowledge of the internals have worked on it, you won't do better by looking at a single number and downloading random apps. For RAM in particular, because of the way virtual memory works, it is hard to get an idea of what is happening. There are caches, shared memory, mapped files, in-app allocators, etc...

burnte · 2023-10-09T19:21:22

Unused RAM is wasted RAM. Why people want to see GOBS of empty RAM boggles my mind.

filchermcurr · 2023-10-09T20:31:38

I think the disconnect is not understanding how the RAM is used. If the average user looks and sees all of their RAM in use, they're going to think that there's no more room for the applications that they want to launch. They don't understand that what's cached will just get out of the way when the memory is actually needed. So they want to see free RAM, because that means it's free for their game or millions of tabs.

lxe · 2023-10-09T18:56:16

This made me chase red herrings when debugging oom issues in production. Wish free would just remove the 'free' column and replaced it with 'available'.

tetha · 2023-10-09T19:54:26

This is what we did in pretty much all of our monitoring some time ago. We ripped out most memory graphs except for "Total Memory" and "Available Memory" as well as memory pressure from the PSI metrics. And we placed alerts on available memory growing low, as well as pages being swapped in. Newer kernel opportunistically swap-out idle pages, but that's fine as long as you never see the path from disk to memory (swap-in).

This has increased the quality of our memory monitoring by a lot.

mkhnews · 2023-10-09T18:15:32

>> If applications want more memory, they just take it back from the disk cache. Q: If there is no swap configured, will a malloc() then take away clean page-cache pages ? Or does that happen only on page-in ?

dfox · 2023-10-09T19:00:22

In general, no and it will happen when there is something actually written to the page (which will cause a page fault and the kernel will have to somehow materialize the page). This works the same way regardless of how /proc/sys/vm/overcommit_memory is configured, the setting only affects how kernel tracks how much memory it is going to need in the future. (Obviously if we talk about malloc() this is a slight over-simplification as most malloc() implementations will write some kind of book-keeping structure and thus dirty some of the allocated pages)

Whether swap is available is more or less irrelevant for this behavior. The only thing that swap changes is that kernel is then able to “clean” dirty anonymous pages by writing them out to swap.

AnotherGoodName · 2023-10-09T18:55:54

malloc will take away from the disk cache.

Fwiw without swap there isn't really any paging in or out (yes mmapped files technically still can but they are basically a special cased type of swap) so your question is hard to parse in this context. The disk cache is all about using unallocated memory and an allocation will reduce it. Paging is irrelevant here.

Btw you should always enable swap. Without it you force all unused but allocated memory to live on physical RAM. Why would you want to do this? There's absolutely no benchmarks that show better performance with no swap. In fact it's almost always the opposite. Add some swap. Enjoy the performance boost!

https://haydenjames.io/linux-performance-almost-always-add-s...

dfox · 2023-10-09T19:11:19

I would say that for any modern unix implementation mmaped pages are quite significant, as all the read-only copies of libc code, other shared libraries and various mmaped data files (iconv tables, locales, terminfo, gettext catalogs…) are not exactly small.

robinsonb5 · 2023-10-09T19:59:58

Which is why disabling swap in the hopes of preventing the system grinding to a halt on out-of-memory doesn't work, and actually makes things worse.

Karunamon · 2023-10-09T20:20:19

Okay, if unused memory is wasted and there are truly no consequences for the "free" column reading zero, then why on a busy system do I get UI chugging and otherwise poor (bordering on unusable) performance under this condition that is immediately resolved by forcing the caches to drop and freeing multiple gigabytes?

Whatever conceivable speedup there is from 12 GB of file cache as opposed to 11 is obliterated multiple times over from the time lost by having to do this dance, or worse, recovering after the oom killer wipes out my X session or browser.

AnotherGoodName · 2023-10-09T20:35:52

>recovering after the oom killer wipes out my X session or browser.

Perhaps you can share more details of what you're doing to force the cache to drop and what the side effects are exactly because an OOM can't be caused by the file cache since the total free memory available to applications remains the same. The whole point of the file cache is to use otherwise unallocated memory and give it up the moment it's needed. There should not be an OOM from this short of an OS bug or an over allocated virtualized system.

Karunamon · 2023-10-09T21:26:04

    echo 3 > /proc/sys/vm/drop_caches

Last time I ran into this was a couple of years ago on a stock Arch system. (Disabilities forced me back to Windows). Every time, the largest memory consumer was the web browser. Also every time, the system became nearly unresponsive due to swap thrashing (kswapd at the top of the CPU usage list, most of which was I/O wait).

Last time I complained about this problem, someone suggested installing zram which did stop it from happening. However, this does not change the fact that there is some pathological failure case that contradicts the central thesis (not to mention, smug tone) of this website and makes searching for solutions to the problem infuriating.

PhilipRoman · 2023-10-09T21:29:33

I find that task priorities in general are not strict enough under Linux. Try running a cpu heavy but very low priority task in background and it still manages to measurably affect the latency of the more important process. And this is just the CPU, not to mention disk utilization and network usage.

I was too lazy to find a proper solution, so I just used mlockall after allocating a massive heap and pin the process to a core that is reserved only for this specific purpose.

I think cgroups has very flexible tools for reserving system wide resources, but haven't had the time to test it yet.

mkhnews · 2023-10-09T20:14:10

Another question is about containers and memory limits. Does the page-cache count against my container memory limit ? And if so, then when I hit that limit from doing many reads, does the page-cache start taking from itself without OOM killer getting involved ?

defer · 2023-10-09T20:27:11

I also want to know this, but in reverse.

I build older android (the OS) versions inside docker containers because they have dependencies on older glibc versions.

This is a memory-heavy multi-threaded process and the OOM killer will kill build threads, making my build fail. However, there is plenty of available (but not free) memory in the docker host, but apparently not available in the container. If I drop caches on the host periodically, the build generally succeeds.

mkhnews · 2023-10-09T20:44:50

And perhaps k8s is a specific category to consider here. I've read and thought I've experienced where 'active' (as opposed to in-active) page-cache does count towards k8s mem limit.

otterley · 2023-10-09T20:41:23

1. Pages cached by applications are charged to its container for the purpose of memory resource limits.

2. IME the kernel takes the container's memory limit into account when determining whether to allocate a page for cache. Caching, by itself, won't cause the container to exceed a memory limit.

jenadine · 2023-10-09T19:20:46

How old is this website? It's from a time when a typical computer only had 1.5 G of ram.

I_Am_Nous · 2023-10-09T19:29:18

The domain appears to have been registered 25 Apr 2009, and I remember seeing this quite a while ago. That would make sense for 1.5 G of RAM being typical. Glad it's still around :)

Pathogen-David · 2023-10-09T20:09:15

Oldest copy on the Wayback Machine is from May 2009 https://web.archive.org/web/20090513043445/https://www.linux...

Tao3300 · 2023-10-09T18:24:43

I pity the fool who don't eat RAM.

msla · 2023-10-09T18:39:35

I guess people around here are Too Young.

This is a reference to a legitimate piece of Internet history:

https://en.wikipedia.org/wiki/Ate_my_balls

> "Ate my balls" is one of the earliest examples of an internet meme. It was widely shared in the late 1990s when adherents created web pages to depict a particular celebrity, fictional character, or other subject's zeal for eating testicles. Often, the site would consist of a humorous fictitious story or comic featuring edited photos about the titular individual; the photo editing was often crude and featured the character next to comic-book style speech in a thought balloon.

> The fad was started in 1996 by Nehal Patel, a student at University of Illinois at Urbana-Champaign with a "Mr. T Ate My Balls" web page.

meepmorp · 2023-10-09T19:25:13

It’s the first thing that popped into my head.

Scarbutt · 2023-10-09T20:12:54

It's weird that their 'used' column is accounting for 'buff/cache'

'man free' states

     used   Used or unavailable memory (calculated as total - available)

simonblack · 2023-10-09T23:27:48

Why all the fuss??

When you can download as much RAM as you want, any time you want.

https://downloadmoreram.com/

mavhc · 2023-10-09T20:14:27

My Linux ram problems are 1: Ubuntu default install, ends up with xwayland using 5GB ram. 2: when running out of ram it seems to default to crashing back to the logon screen

404mm · 2023-10-09T22:55:18

You can always just download more RAM. https://www.downloadmoreram.com

fuzztester · 2023-10-10T01:51:55

Windows ate my hard disk (every year).

thiag0 · 2023-10-10T12:41:33

Is the website down? I was reading and suddenly started to get timeout.

wingworks · 2023-10-09T20:05:35

htop shows this (it's the orange/yellow bar in RAM)

withinboredom · 2023-10-10T00:00:55

I recommend btop these days (https://github.com/aristocratos/btop)

wingworks · 2023-10-10T01:48:38

I just tried btop, it has crazy colours (nothing like screenshots) on my Mac Terminal app (Misterioso theme).

zaptrem · 2023-10-09T18:38:14

> Disk caching makes the system much faster and more responsive! There are no downsides, except for confusing newbies. It does not take memory away from applications in any way, ever!

No downsides except for massive data loss when the system suddenly loses power/a drive crashes and the massive theft of memory from host OSes (e.g., when using Windows Subsystem for Linux).

nightfly · 2023-10-09T19:26:08

There is no data loss from disk caching.

zaptrem · 2023-10-09T19:28:49

Does disk caching not include write caching?

ormax3 · 2023-10-09T21:07:54

the cache being talked about is for recently/frequently accesses things, not stuff pending write

nightfly · 2023-10-09T19:49:07

Not in the context we're talking about

hifromwork · 2023-10-09T22:51:32

Are you sure? Dirty pages have to reside somewhere, so they are actually stuck in RAM until successfully written RAM. Linux will lie with straight face that dd to my 8gb pendrive finished successfully in a few seconds, so there may be non-trivial amounts of RAM involved here.

I don't know enough of Linux internals to know if the writeback cache and read cache are the same object in the kernel, but they feel similar.

Of course the real response is that without write cache (effectively adding fsync to every write) any modern linux system will grind to absolute halt and doing anything would be a challenge. So contray to GP's post, it's not reasonable to complain about it's existence.

withinboredom · 2023-10-10T00:11:32

It's written asap to disk unless you configure it otherwise. The disk (or flash disk, in your case) has its own ram before actually flushing to physical storage (fsync, unless it lies, in which case there is probably an article about it here cough macs cough). So the kernel isn't lying to you, but your disk probably is.