Hi everyone !
Long time no see, I hope you're safe & sound !
I'm currently experimenting with tight alignment and I observed some performance regressions. After investigation it seems quite not surprising but I want to address the issue and create the discussion.
On MSFS2024 we have a lot of allocations, streaming is widely use and creates a lot of back and forth inside the pools. We then had to use the defragmentation as we can reach up to 2GB of useless VRAM without it. It works great since 2022!
Our traditional figures are something around 18K allocation for 700 blocks, I'll link the dump of the allocation in this issue.
While enabling tight alignment, it works quite well but we end up with a lot of stutters due to defragmentation. Usually, we finish loading we end up with 1GB of useless memory and with 10 moves by frames we end up with less than 100MB of useless memory after few seconds. It is nearly invisible in our framerate. Worse pass can be 8ms in a thread and only one time and voilà!
With tight alignment, we have the same process but near [120;200] MB of useless memory, it will try to defragment but have a hard time converging. Each defragmentation is done in multiple pass (when one pass is often enough without tight) and each pass compute moves during 45ms (still with the same limit on 10 alloc by defrag). After breaking and trying to understand, it seems that each allocation try to be moved, no free space is found inside other blocks, no free space is found inside the same block and for each allocation we end up with the worst path failing each time and trying all allocation without success. It looks like that with verbose markers :
After understanding that no free spaces are found, I checked the allocator dump and I saw a lot of small free spaces (128, 1280, 624 ...) that explains a lot why no allocation find a fit.
At the end, allocation algorithm seems good, defragmentation algorithm (we use the balanced one) seems good, I just have ugly pools.
Do you think something is actionable here ? Does tight alignment change the paradigm and the algorithm should group allocation by size or alignment ? Should I forget about using tight alignment ? Did I forget something obvious ?
Would it be possible to enable tight only on certain pool (our main issue was BLAS/TLAS that could be very small but they are in a custom pool without defrag, so having tight only on this pool can partly solve the issue on our end) (It seems easily possible but it is more a philosophical question than a feasibility one) ?
AllocationDump.json
Don't hesitate if you have any questions, I hope my questions are clear enough :D
Have a nice day all.
Cheers,
Robin.
Hi everyone !
Long time no see, I hope you're safe & sound !
I'm currently experimenting with tight alignment and I observed some performance regressions. After investigation it seems quite not surprising but I want to address the issue and create the discussion.
On MSFS2024 we have a lot of allocations, streaming is widely use and creates a lot of back and forth inside the pools. We then had to use the defragmentation as we can reach up to 2GB of useless VRAM without it. It works great since 2022!
Our traditional figures are something around 18K allocation for 700 blocks, I'll link the dump of the allocation in this issue.
While enabling tight alignment, it works quite well but we end up with a lot of stutters due to defragmentation. Usually, we finish loading we end up with 1GB of useless memory and with 10 moves by frames we end up with less than 100MB of useless memory after few seconds. It is nearly invisible in our framerate. Worse pass can be 8ms in a thread and only one time and voilà!
With tight alignment, we have the same process but near [120;200] MB of useless memory, it will try to defragment but have a hard time converging. Each defragmentation is done in multiple pass (when one pass is often enough without tight) and each pass compute moves during 45ms (still with the same limit on 10 alloc by defrag). After breaking and trying to understand, it seems that each allocation try to be moved, no free space is found inside other blocks, no free space is found inside the same block and for each allocation we end up with the worst path failing each time and trying all allocation without success. It looks like that with verbose markers :
After understanding that no free spaces are found, I checked the allocator dump and I saw a lot of small free spaces (128, 1280, 624 ...) that explains a lot why no allocation find a fit.
At the end, allocation algorithm seems good, defragmentation algorithm (we use the balanced one) seems good, I just have ugly pools.
Do you think something is actionable here ? Does tight alignment change the paradigm and the algorithm should group allocation by size or alignment ? Should I forget about using tight alignment ? Did I forget something obvious ?
Would it be possible to enable tight only on certain pool (our main issue was BLAS/TLAS that could be very small but they are in a custom pool without defrag, so having tight only on this pool can partly solve the issue on our end) (It seems easily possible but it is more a philosophical question than a feasibility one) ?
AllocationDump.json
Don't hesitate if you have any questions, I hope my questions are clear enough :D
Have a nice day all.
Cheers,
Robin.