PAX: optimize io read for multiple discrete columns in a group
When reading multiple discrete columns in a group, the code
reads columnar data block by block in synchronous mode. It
means that all I/O requests on the columnar data are completed
in a serialized manner, which is low efficient.
This commit uses iouring to submit a batch of IO request to allow
OS optimizes IO in parallel for better throughput.
libaio is another candidate. But it doesn't bring improvement
in our benchmark test(without O_DIRECT).