Skip to content

Throw NegativeArraySizeException in Flink Segment #18479

@KnightChess

Description

@KnightChess

Bug Description

What happened:
flink: 1.16
hudi: 0.13.1 with #12967
we use #12967 in our inner branch, our record is 400kb avg size, the default write.memory.segment.page.size is 32kb. we found during the flush, it frequently throws the following exception, causing data to fail to be written normally, but if we set write.memory.segment.page.size 500kb, the exception will no longer occur.

Caused by: java.lang.RuntimeException: java.lang.NegativeArraySizeException: -2063597517
	at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:72)
	at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37)
	at org.apache.hudi.jd.org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
	at org.apache.hudi.jd.org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
	at org.apache.hudi.io.storage.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:175)
	at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:45)
	at org.apache.hudi.io.storage.row.LSMHoodieRowDataCreateHandle.writeRow(LSMHoodieRowDataCreateHandle.java:235)
	... 12 more
Caused by: java.lang.NegativeArraySizeException: -2063597517
	at org.apache.flink.table.data.binary.BinarySegmentUtils.getBytes(BinarySegmentUtils.java:296)
	at org.apache.flink.table.data.binary.BinaryStringData.toBytes(BinaryStringData.java:112)
	at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$StringWriter.write(ParquetRowDataWriter.java:266)
	at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$ArrayWriter.doWrite(ParquetRowDataWriter.java:532)
	at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$ArrayWriter.write(ParquetRowDataWriter.java:503)
	at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:95)
	at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:70)
	... 18 more

What you expected:
when write.memory.segment.page.size is 32kb can still write data

Steps to reproduce:
we can't reprodut the same exception, but have a similar one.
branch: master
flink: 1.18
UT: TestWriteCopyOnWrite&testInsertWithSmallBufferSize
env1: write.memory.segment.page.size = 32

Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException
	at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:73)
	at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:65)
	... 23 more
Caused by: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException
	at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:122)
	at org.apache.hudi.io.HoodieWriteHandle.write(HoodieWriteHandle.java:240)
	at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:48)
	at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:34)
	at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:67)
	... 24 more
Caused by: java.lang.RuntimeException: java.lang.IndexOutOfBoundsException
	at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:71)
	at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37)
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
	at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
	at org.apache.hudi.io.hadoop.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:149)
	at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:68)
	at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRowWithMetaData(HoodieRowDataParquetWriter.java:76)
	at org.apache.hudi.io.storage.row.HoodieRowDataFileWriter.writeWithMetadata(HoodieRowDataFileWriter.java:63)
	at org.apache.hudi.io.BaseCreateHandle.writeRecordToFile(BaseCreateHandle.java:162)
	at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:102)
	... 28 more
Caused by: java.lang.IndexOutOfBoundsException
	at org.apache.flink.core.memory.MemorySegment.getLong(MemorySegment.java:935)
	at org.apache.flink.table.data.binary.BinaryRowData.getTimestamp(BinaryRowData.java:351)
	at org.apache.flink.table.data.utils.JoinedRowData.getTimestamp(JoinedRowData.java:203)
	at org.apache.hudi.client.model.AbstractHoodieRowData.getTimestamp(AbstractHoodieRowData.java:129)
	at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$Timestamp64Writer.write(ParquetRowDataWriter.java:305)
	at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:93)
	at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:69)
	... 37 more

env2: write.memory.segment.page.size = 32, increate the DATA_SET_INSERT_DUPLICATES record size

org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20260408122418316

	at org.apache.hudi.table.action.commit.FlinkWriteHelper.write(FlinkWriteHelper.java:81)
	at org.apache.hudi.table.action.commit.FlinkUpsertCommitActionExecutor.execute(FlinkUpsertCommitActionExecutor.java:53)
	at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.upsert(HoodieFlinkCopyOnWriteTable.java:113)
	at org.apache.hudi.client.HoodieFlinkWriteClient.upsert(HoodieFlinkWriteClient.java:223)
	at org.apache.hudi.sink.StreamWriteFunction.lambda$initWriteFunction$514ba0a6$2(StreamWriteFunction.java:215)
	at org.apache.hudi.sink.StreamWriteFunction$WriteFunction.write(StreamWriteFunction.java:516)
	at org.apache.hudi.sink.StreamWriteFunction.writeRecords(StreamWriteFunction.java:445)
	at org.apache.hudi.sink.StreamWriteFunction.flushBucket(StreamWriteFunction.java:381)
	at org.apache.hudi.sink.StreamWriteFunction.bufferRecord(StreamWriteFunction.java:323)
	at org.apache.hudi.sink.StreamWriteFunction.processElement(StreamWriteFunction.java:184)
	at org.apache.hudi.sink.utils.StreamWriteFunctionWrapper.invoke(StreamWriteFunctionWrapper.java:215)
	at org.apache.hudi.sink.utils.TestWriteBase$TestHarness.consume(TestWriteBase.java:191)
	at org.apache.hudi.sink.TestWriteCopyOnWrite.testInsertWithSmallBufferSize(TestWriteCopyOnWrite.java:540)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
Caused by: java.lang.RuntimeException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
	at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:123)
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
	at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:124)
	at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:103)
	at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:98)
	at org.apache.hudi.table.action.commit.BaseFlinkCommitActionExecutor.execute(BaseFlinkCommitActionExecutor.java:65)
	at org.apache.hudi.table.action.commit.FlinkWriteHelper.write(FlinkWriteHelper.java:74)
	... 15 more
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
	at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:69)
	at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:44)
	at org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
	... 21 more
Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
	at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:73)
	at org.apache.hudi.execution.FlinkLazyInsertIterable.computeNext(FlinkLazyInsertIterable.java:65)
	... 23 more
Caused by: org.apache.hudi.exception.HoodieException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
	at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:122)
	at org.apache.hudi.io.HoodieWriteHandle.write(HoodieWriteHandle.java:240)
	at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:48)
	at org.apache.hudi.execution.ExplicitWriteHandler.consume(ExplicitWriteHandler.java:34)
	at org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:67)
	... 24 more
Caused by: java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
	at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:71)
	at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:37)
	at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
	at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
	at org.apache.hudi.io.hadoop.HoodieBaseParquetWriter.write(HoodieBaseParquetWriter.java:149)
	at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRow(HoodieRowDataParquetWriter.java:68)
	at org.apache.hudi.io.storage.row.HoodieRowDataParquetWriter.writeRowWithMetaData(HoodieRowDataParquetWriter.java:76)
	at org.apache.hudi.io.storage.row.HoodieRowDataFileWriter.writeWithMetadata(HoodieRowDataFileWriter.java:63)
	at org.apache.hudi.io.BaseCreateHandle.writeRecordToFile(BaseCreateHandle.java:162)
	at org.apache.hudi.io.BaseCreateHandle.doWrite(BaseCreateHandle.java:102)
	... 28 more
Caused by: java.lang.IndexOutOfBoundsException: pos: 1734698613, length: 1936089412, index: 1734698597, offset: 0
	at org.apache.flink.core.memory.MemorySegment.get(MemorySegment.java:467)
	at org.apache.flink.table.data.binary.BinarySegmentUtils.getBytes(BinarySegmentUtils.java:292)
	at org.apache.flink.table.data.binary.BinaryStringData.toBytes(BinaryStringData.java:112)
	at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter$StringWriter.write(ParquetRowDataWriter.java:255)
	at org.apache.hudi.io.storage.row.parquet.ParquetRowDataWriter.write(ParquetRowDataWriter.java:93)
	at org.apache.hudi.io.storage.row.RowDataParquetWriteSupport.write(RowDataParquetWriteSupport.java:69)
	... 37 more

Environment

Hudi version: 0.13.1/master
Query engine: (Spark/Flink/Trino etc) Flink
Relevant configs: write.memory.segment.page.size

Logs and Stack Trace

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:bugBug reports and fixes

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions