Guillaume Sanchez, Honglu Fan, Alexander Spangher, Elad Levi, Pawan Sasanka Ammanamanchi, Stella Biderman
Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks: Q&A, reasoning, code generation, and machine translation, achieving SOTA on LAMBADA with LLaMA-7B over PaLM-540B; (2) brings improvements equivalent to a model with twice the parameter-count; (3) can stack alongside other inference-time methods like Chain-of-Thought and Self-Consistency, yielding further improvements in difficult tasks; (4) can be used to increase the faithfulness and coherence of assistants in challenging form-driven and content-driven prompts: in a human evaluation we show a 75% preference for GPT4All using CFG over baseline.
Quick presentation 🐦 Read the Twitter thread
For a language model prompted with prompt c , we contrast the logits with and without this prompts in order to emphasize the difference the prompt makes. Mathematically, we make a step of size $\gamma$ from the unconditioned logits to the conditioned logits. $\gamma > 1$ overemphasizes the prompts and brings additional benefits.
$$ \log \hat{P}_\theta(w_i|w_{j<i},c) = \log P_\theta(w_i|w_{j<i}) + \gamma (\log P_\theta(w_i|w_{j<i}, c) - \log P_\theta(w_i|w_{j<i}))$$Instead of an unconditional base, we can have a negative prompt representing what we want to walk away from. Also works very well with assistants.
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model.generation_config.pad_token_id = tokenizer.eos_token_id
inputs = tokenizer(["Today, a dragon flew over Paris, France,"], return_tensors="pt")
for _ in range(3):
print('--')
out = model.generate(
# CFG parameters
inputs["input_ids"], attention_mask=inputs['attention_mask'], guidance_scale=1.5,
# sampling parameters
top_p=0.95, max_new_tokens=125, do_sample=True)
print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])
-- Today, a dragon flew over Paris, France, and struck one person. The victim was an American, who was suffering from depression, according to the Associated Press. "It's not that the dragon is attacking America – it's that the attack was of an American," said a Paris native and former firefighter who lived in London. "It was a very scary thing." In the days leading up to the attack, a number of people in the U.S. rushed to Paris to tell their stories. In the U.K., in response to a demand for photos, local authorities held an event that attracted a fair amount of attention. In France, the same -- Today, a dragon flew over Paris, France, killing at least 80 people. That dragon is likely to be named Chirac, after French National Dragon, and its name is likely to be similar to the one used in France's infamous "Camel of the Day" attack on Paris in December 1963. Chirac, also known as "The Dragon of France," was a massive, dragon-shaped dragon, which is still associated with Paris' famous French cuisine, particularly "Hac de Chirac." In France, there are two types of Chirac, which are referred to as the "Hac du Chirac" and -- Today, a dragon flew over Paris, France, in August 2013 and attacked a wedding of Jean-Claude Juncker and Emmanuel Macron, both of whom were running for President of the EU. The dragon was said to have come down the runway to give off a large flame, which burned for eight minutes before it was finally extinguished. France, meanwhile, is undergoing an evacuation. More from GlobalPost: 'This isn't just for Trump': The future of a 'good idea' in Paris
That's a lot of poor cheeseland people being killed. We can change that with negative prompting
# with a negative prompt
neg_inputs = tokenizer(["killing,"], return_tensors="pt")
for _ in range(3):
print('--')
out = model.generate(
# CFG
inputs["input_ids"], attention_mask=inputs['attention_mask'], guidance_scale=1.5,
negative_prompt_ids=neg_inputs['input_ids'],
# sampling parameters
top_p=0.95, max_new_tokens=125, do_sample=True)
print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])
-- Today, a dragon flew over Paris, France, where dozens of firefighters from France's National Disaster Response Centre were helping to restore the city's famous French cathedral from a falling snowstorm to its perfect new condition. Advertisement Video courtesy of CTV Advertisement In New York City, the city council passed a resolution Friday asking for some more funds to help restore the famed French cathedral after the snow storm. The measure was called "Operation Tenderloin" because the city did not immediately respond to requests for more money. UPDATE: A CTV crew spotted the dragon in the sky above New York and called police. It was soon spotted -- Today, a dragon flew over Paris, France, on Sunday to fly into the skies over Paris on Thursday. The dragon is the second such dragon to land in France this year as it flew through the city after a previous one. In October, an unidentified plane flew over Paris and crashed in the town of Bordeaux on a routine flight. It's not yet clear when the plane's pilots thought it might land. The second such dragon is believed to have landed in northern Turkey on Tuesday, with at least one possible plane crash involving a person. Last week, in February 2017, a helicopter flew over the French capital and a second -- Today, a dragon flew over Paris, France, just after midnight (5pm GMT). The dragon was seen at a location just off Paris Boulevard. A man and a young girl were also detained for flying over Paris on Saturday. The person was arrested. As of yet, there are no reports of a missing persons case in the country. The situation on the ground has remained tense for a while, and the weather has been erratic (due to all the storms, storms, flooding, storms and high winds, etc.) The situation is still in a bit of a panic, because it would have been easy for any one of the following scenarios to go from
@misc{sanchez2023stay,
title={Stay on topic with Classifier-Free Guidance},
author={Guillaume Sanchez and Honglu Fan and Alexander Spangher and Elad Levi and Pawan Sasanka Ammanamanchi and Stella Biderman},
year={2023},
eprint={2306.17806},
archivePrefix={arXiv},
primaryClass={cs.CL}
}