To reduce FPGA usage, I suspect the best 'bang for the buck' is programming the ADV7513 to accept a format that the ARM framebuffer are happy with. The ADV7513 is pretty flexible and can do a fair bit of conversion for us, however the stock verilog from ADI seems to do a lot in the FPGA.
The FPGA is connected by the 16 bit bus, which stops us just outputting 32 bit RGB444, however 16 bit RGB444 is available (Mode 0x5). The ADV7513 colour converter can convert that to YCbCr for free.
The would reduce the FPGA to mostly a DMA as a 1920x1080 16bit framebuffer is moved to the ADV7513, which has been programmed to output standard HDMI YCbCr.
Even if the App level framebuffer is 32bit, the ARMs can convert to 16bit very quickly using NEON, if FPGA fabric is tight...