Evaluating and Improving Explanation Coherence for Multimodal Emotion Recognition

The official repository for "Evaluating and Improving Explanation Coherence for Multimodal Emotion Recognition".

Change all places with YOUR_PATH to your local directories.

Preparations

Environment

See configs/HumanOmni.yml for SFT (Python 3.10) and configs/r1-v.yml for GRPO (Python 3.11).

For the qwen3_caption_vllm environment, please refer to the Qwen3-Omni repository.

Pretrained Models

Model	HuggingFace
`HumanOmni-0.5B`
`HumanOmni-7B`
`Qwen3-Omni-30B-A3B-Instruct`
`bert-base-uncased`
`siglip-base-patch16-224`
`siglip-so400m-patch14-384`
`whisper-large-v3`

Data Format

SFT

{
    "video": "VIDEO_PATH",
    "conversations": [
        {
        "from": "human",
        "value": "<video>\n<audio>\nAs an emotional recognition expert; throughout the video, which emotion conveyed by the characters is the most obvious to you? Output the thinking process in <think> </think> and final emotion in <answer> </answer> tags."
        },
        {
        "from": "gpt",
        "value": "<think>THINK_CONTENT</think>\n<answer>EMOTION_LABEL</answer>"
        }
    ]
}

GRPO / Inference

{
    "video": "VIDEO_PATH",
    "conversations": [
        {
            "from": "human",
            "value": "<video>\n<audio>\nAs an emotional recognition expert; throughout the video, which emotion conveyed by the characters is the most obvious to you? Output the thinking process in <think> </think> and final emotion in <answer> </answer> tags."
        },
        {
            "from": "gpt",
            "value": "EMOTION_LABEL"
        }
    ]
}

Training

SFT

bash srun_sft_humanomni.sh

GRPO

To use FG-CE, run srun_fgce.sh and fill in the POD_IP in srun_grpo_humanomni.sh.

bash srun_grpo_humanomni.sh

Inference

conda activate r1-v
torchrun --nproc_per_node=$GPUS --nnodes=1 \
    --master_addr=localhost --master_port=12345 \
    inference_batch.py \
    --model_path $MODEL_PATH \
    --bert_path $BERT_PATH \
    --input_jsonl $INPUT_JSONL \
    --output_dir $OUTPUT_DIR

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
docs		docs
src		src
.gitignore		.gitignore
README.md		README.md
inference_batch.py		inference_batch.py
srun_fgce.sh		srun_fgce.sh
srun_grpo_humanomni.sh		srun_grpo_humanomni.sh
srun_sft_humanomni.sh		srun_sft_humanomni.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating and Improving Explanation Coherence for Multimodal Emotion Recognition

Preparations

Environment

Pretrained Models

Data Format

SFT

GRPO / Inference

Training

SFT

GRPO

Inference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Evaluating and Improving Explanation Coherence for Multimodal Emotion Recognition

Preparations

Environment

Pretrained Models

Data Format

SFT

GRPO / Inference

Training

SFT

GRPO

Inference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages