Aligning massive language fashions (LLMs) with human expectations with out human-annotated choice knowledge is a vital downside. On this paper, we suggest a technique to judge the response choice through the use of the output chances of response pairs beneath contrastive immediate pairs, which may obtain higher efficiency on LLaMA2-7B and LLaMA2-13B in comparison with RLAIF. Primarily based on this, we suggest an computerized alignment methodology, Direct Massive Mannequin Alignment (DLMA). First, we use contrastive immediate pairs to routinely generate choice knowledge. Then, we proceed to judge the generated choice knowledge utilizing contrastive immediate pairs and calculate a self-rewarding rating. Lastly, we use the DPO algorithm to successfully align LLMs by combining this self-rewarding rating. Within the experimental stage, our DLMA methodology may surpass the RLHF methodology with out counting on human-annotated choice knowledge.