Bug Description
What happened:
flink: 1.16
hudi: 0.13.1 with #12967
we use #12967 in our inner branch, our record is 400kb avg size, the default write.memory.segment.page.size is 32kb. we found during the flush, it frequently throws the following exception, causing data to fail to be written normally, but if we set write.memory.segment.page.size 500kb, the exception will no longer occur.
Caused by: java.lang.RuntimeException: java.lang.NegativeArraySizeException: -2063597517
at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:72)
at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37)
at org.apache.hudi.jd.org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
at org.apache.hudi.jd.org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
at org.apache.hudi.io.storage.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:175)
at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:45)
at org.apache.hudi.io.storage.row.LSMHoodieRowDataCreateHandle.writeRow(LSMHoodieRowDataCreateHandle.java:235)
... 12 more
Caused by: java.lang.NegativeArraySizeException: -2063597517
at org.apache.flink.table.data.binary.BinarySegmentUtils.getBytes(BinarySegmentUtils.java:296)
at org.apache.flink.table.data.binary.BinaryStringData.toBytes(BinaryStringData.java:112)
at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$StringWriter.write(ParquetRowDataWriter.java:266)
at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$ArrayWriter.doWrite(ParquetRowDataWriter.java:532)
at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$ArrayWriter.write(ParquetRowDataWriter.java:503)
at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:95)
at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:70)
... 18 more
What you expected:
when write.memory.segment.page.size is 32kb can still write data
Steps to reproduce:
we can't reprodut the same exception, but have a similar one.
branch: master
flink: 1.18
UT: TestWriteCopyOnWrite&testInsertWithSmallBufferSize
env1: write.memory.segment.page.size = 32
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException
at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:73)
at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:65)
... 23 more
Caused by: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException
at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:122)
at org.apache.hudi.io.HoodieWriteHandle.write(HoodieWriteHandle.java:240)
at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:48)
at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:34)
at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:67)
... 24 more
Caused by: java.lang.RuntimeException: java.lang.IndexOutOfBoundsException
at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:71)
at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
at org.apache.hudi.io.hadoop.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:149)
at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:68)
at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRowWithMetaData(HoodieRowDataParquetWriter.java:76)
at org.apache.hudi.io.storage.row.HoodieRowDataFileWriter.writeWithMetadata(HoodieRowDataFileWriter.java:63)
at org.apache.hudi.io.BaseCreateHandle.writeRecordToFile(BaseCreateHandle.java:162)
at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:102)
... 28 more
Caused by: java.lang.IndexOutOfBoundsException
at org.apache.flink.core.memory.MemorySegment.getLong(MemorySegment.java:935)
at org.apache.flink.table.data.binary.BinaryRowData.getTimestamp(BinaryRowData.java:351)
at org.apache.flink.table.data.utils.JoinedRowData.getTimestamp(JoinedRowData.java:203)
at org.apache.hudi.client.model.AbstractHoodieRowData.getTimestamp(AbstractHoodieRowData.java:129)
at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$Timestamp64Writer.write(ParquetRowDataWriter.java:305)
at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:93)
at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:69)
... 37 more
env2: write.memory.segment.page.size = 32, increate the DATA_SET_INSERT_DUPLICATES record size
org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20260408122418316
at org.apache.hudi.table.action.commit.FlinkWriteHelper.write(FlinkWriteHelper.java:81)
at org.apache.hudi.table.action.commit.FlinkUpsertCommitActionExecutor.execute(FlinkUpsertCommitActionExecutor.java:53)
at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.upsert(HoodieFlinkCopyOnWriteTable.java:113)
at org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:223)
at org.apache.hudi.sink.StreamWriteFunction.lambda$initWriteFunction$514ba0a6$2(StreamWriteFunction.java:215)
at org.apache.hudi.sink.StreamWriteFunction$WriteFunction.write(StreamWriteFunction.java:516)
at org.apache.hudi.sink.StreamWriteFunction.writeRecords(StreamWriteFunction.java:445)
at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:381)
at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:323)
at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:184)
at org.apache.hudi.sink.utils.StreamWriteFunctionWrapper.invoke(StreamWriteFunctionWrapper.java:215)
at org.apache.hudi.sink.utils.TestWriteBase$TestHarness.consume(TestWriteBase.java:191)
at org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertWithSmallBufferSize(TestWriteCopyOnWrite.java:540)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
Caused by: java.lang.RuntimeException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:123)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:124)
at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:103)
at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:98)
at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:65)
at org.apache.hudi.table.action.commit.FlinkWriteHelper.write(FlinkWriteHelper.java:74)
... 15 more
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:69)
at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:44)
at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
... 21 more
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:73)
at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:65)
... 23 more
Caused by: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:122)
at org.apache.hudi.io.HoodieWriteHandle.write(HoodieWriteHandle.java:240)
at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:48)
at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:34)
at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:67)
... 24 more
Caused by: java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:71)
at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
at org.apache.hudi.io.hadoop.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:149)
at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:68)
at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRowWithMetaData(HoodieRowDataParquetWriter.java:76)
at org.apache.hudi.io.storage.row.HoodieRowDataFileWriter.writeWithMetadata(HoodieRowDataFileWriter.java:63)
at org.apache.hudi.io.BaseCreateHandle.writeRecordToFile(BaseCreateHandle.java:162)
at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:102)
... 28 more
Caused by: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
at org.apache.flink.core.memory.MemorySegment.get(MemorySegment.java:467)
at org.apache.flink.table.data.binary.BinarySegmentUtils.getBytes(BinarySegmentUtils.java:292)
at org.apache.flink.table.data.binary.BinaryStringData.toBytes(BinaryStringData.java:112)
at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$StringWriter.write(ParquetRowDataWriter.java:255)
at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:93)
at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:69)
... 37 more
Environment
Hudi version: 0.13.1/master
Query engine: (Spark/Flink/Trino etc) Flink
Relevant configs: write.memory.segment.page.size
Logs and Stack Trace
No response
Bug Description
What happened:
flink: 1.16
hudi: 0.13.1 with #12967
we use #12967 in our inner branch, our record is 400kb avg size, the default
write.memory.segment.page.sizeis 32kb. we found during the flush, it frequently throws the following exception, causing data to fail to be written normally, but if we setwrite.memory.segment.page.size500kb, the exception will no longer occur.What you expected:
when
write.memory.segment.page.sizeis 32kb can still write dataSteps to reproduce:
we can't reprodut the same exception, but have a similar one.
branch: master
flink: 1.18
UT: TestWriteCopyOnWrite&testInsertWithSmallBufferSize
env1: write.memory.segment.page.size = 32
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:73) at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:65) ... 23 more Caused by: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:122) at org.apache.hudi.io.HoodieWriteHandle.write(HoodieWriteHandle.java:240) at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:48) at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:34) at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:67) ... 24 more Caused by: java.lang.RuntimeException: java.lang.IndexOutOfBoundsException at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:71) at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138) at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310) at org.apache.hudi.io.hadoop.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:149) at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:68) at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRowWithMetaData(HoodieRowDataParquetWriter.java:76) at org.apache.hudi.io.storage.row.HoodieRowDataFileWriter.writeWithMetadata(HoodieRowDataFileWriter.java:63) at org.apache.hudi.io.BaseCreateHandle.writeRecordToFile(BaseCreateHandle.java:162) at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:102) ... 28 more Caused by: java.lang.IndexOutOfBoundsException at org.apache.flink.core.memory.MemorySegment.getLong(MemorySegment.java:935) at org.apache.flink.table.data.binary.BinaryRowData.getTimestamp(BinaryRowData.java:351) at org.apache.flink.table.data.utils.JoinedRowData.getTimestamp(JoinedRowData.java:203) at org.apache.hudi.client.model.AbstractHoodieRowData.getTimestamp(AbstractHoodieRowData.java:129) at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$Timestamp64Writer.write(ParquetRowDataWriter.java:305) at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:93) at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:69) ... 37 moreenv2: write.memory.segment.page.size = 32, increate the
DATA_SET_INSERT_DUPLICATESrecord sizeEnvironment
Hudi version: 0.13.1/master
Query engine: (Spark/Flink/Trino etc) Flink
Relevant configs: write.memory.segment.page.size
Logs and Stack Trace
No response