Artificial IntelligenceTraining & Inference

Direct Preference Optimisation

Overview

A simplified alternative to RLHF that directly optimises language model policies using preference data without requiring a separate reward model.

Cross-References(2)

Natural Language Processing

More in Artificial Intelligence

See Also