Date	Model	Defense	Paper	Name	Threat model	Notes	Average queries	Attack success rate	Jailbreak artifacts
12 Oct 2023	Vicuna-13B	None	Jailbreaking Black Box Large Language Models in Twenty Queries	Prompt Automatic Iterative Refinement (PAIR)	Black-box access	LLM-assisted attack	34	69%	Link
12 Oct 2023	Llama-2-7B	None	Jailbreaking Black Box Large Language Models in Twenty Queries	Prompt Automatic Iterative Refinement (PAIR)	Black-box access	LLM-assisted attack	88	0%	Link
12 Oct 2023	Vicuna-13B	SmoothLLM	Jailbreaking Black Box Large Language Models in Twenty Queries	Prompt Automatic Iterative Refinement (PAIR)	Black-box access	LLM-assisted attack	34	55%	Link
12 Oct 2023	Llama-2-7B	SmoothLLM	Jailbreaking Black Box Large Language Models in Twenty Queries	Prompt Automatic Iterative Refinement (PAIR)	Black-box access	LLM-assisted attack	88	0%	Link
12 Oct 2023	Vicuna-13B	Perplexity filter	Jailbreaking Black Box Large Language Models in Twenty Queries	Prompt Automatic Iterative Refinement (PAIR)	Black-box access	LLM-assisted attack	34	69%	Link
12 Oct 2023	Llama-2-7B	Perplexity filter	Jailbreaking Black Box Large Language Models in Twenty Queries	Prompt Automatic Iterative Refinement (PAIR)	Black-box access	LLM-assisted attack	88	0%	Link
12 Oct 2023	Vicuna-13B	Erase-and-Check	Jailbreaking Black Box Large Language Models in Twenty Queries	Prompt Automatic Iterative Refinement (PAIR)	Black-box access	LLM-assisted attack	34	0%	Link
12 Oct 2023	Llama-2-7B	Erase-and-Check	Jailbreaking Black Box Large Language Models in Twenty Queries	Prompt Automatic Iterative Refinement (PAIR)	Black-box access	LLM-assisted attack	88	0%	Link
12 Oct 2023	Vicuna-13B	Synonym Substitution	Jailbreaking Black Box Large Language Models in Twenty Queries	Prompt Automatic Iterative Refinement (PAIR)	Black-box access	LLM-assisted attack	34	22%	Link
12 Oct 2023	Llama-2-7B	Synonym Substitution	Jailbreaking Black Box Large Language Models in Twenty Queries	Prompt Automatic Iterative Refinement (PAIR)	Black-box access	LLM-assisted attack	88	0%	Link
12 Oct 2023	Vicuna-13B	Remove Non-Dictionary	Jailbreaking Black Box Large Language Models in Twenty Queries	Prompt Automatic Iterative Refinement (PAIR)	Black-box access	LLM-assisted attack	34	0%	Link
12 Oct 2023	Llama-2-7B	Remove Non-Dictionary	Jailbreaking Black Box Large Language Models in Twenty Queries	Prompt Automatic Iterative Refinement (PAIR)	Black-box access	LLM-assisted attack	88	1%	Link
27 Jul 2023	Vicuna-13B	None	Universal and Transferable Adversarial Attacks on Aligned Language Models	Greedy Coordinate Gradient (GCG)	White-box access	Suffix attack, 256k queries	256K	80%	Link
27 Jul 2023	Llama-2-7B	None	Universal and Transferable Adversarial Attacks on Aligned Language Models	Greedy Coordinate Gradient (GCG)	White-box access	Suffix attack, 256k queries	256K	3%	Link
27 Jul 2023	Vicuna-13B	SmoothLLM	Universal and Transferable Adversarial Attacks on Aligned Language Models	Greedy Coordinate Gradient (GCG)	White-box access	Suffix attack, 256k queries	256K	4%	Link
27 Jul 2023	Llama-2-7B	SmoothLLM	Universal and Transferable Adversarial Attacks on Aligned Language Models	Greedy Coordinate Gradient (GCG)	White-box access	Suffix attack, 256k queries	256K	0%	Link
27 Jul 2023	Vicuna-13B	Perplexity filter	Universal and Transferable Adversarial Attacks on Aligned Language Models	Greedy Coordinate Gradient (GCG)	White-box access	Suffix attack, 256k queries	256K	3%	Link
27 Jul 2023	Llama-2-7B	Perplexity filter	Universal and Transferable Adversarial Attacks on Aligned Language Models	Greedy Coordinate Gradient (GCG)	White-box access	Suffix attack, 256k queries	256K	1%	Link
27 Jul 2023	Vicuna-13B	Erase-and-Check	Universal and Transferable Adversarial Attacks on Aligned Language Models	Greedy Coordinate Gradient (GCG)	White-box access	Suffix attack, 256k queries	256K	17%	Link
27 Jul 2023	Llama-2-7B	Erase-and-Check	Universal and Transferable Adversarial Attacks on Aligned Language Models	Greedy Coordinate Gradient (GCG)	White-box access	Suffix attack, 256k queries	256K	1%	Link
27 Jul 2023	Vicuna-13B	Synonym Substitution	Universal and Transferable Adversarial Attacks on Aligned Language Models	Greedy Coordinate Gradient (GCG)	White-box access	Suffix attack, 256k queries	256K	11%	Link
27 Jul 2023	Llama-2-7B	Synonym Substitution	Universal and Transferable Adversarial Attacks on Aligned Language Models	Greedy Coordinate Gradient (GCG)	White-box access	Suffix attack, 256k queries	256K	0%	Link
27 Jul 2023	Vicuna-13B	Remove Non-Dictionary	Universal and Transferable Adversarial Attacks on Aligned Language Models	Greedy Coordinate Gradient (GCG)	White-box access	Suffix attack, 256k queries	256K	18%	Link
27 Jul 2023	Llama-2-7B	Remove Non-Dictionary	Universal and Transferable Adversarial Attacks on Aligned Language Models	Greedy Coordinate Gradient (GCG)	White-box access	Suffix attack, 256k queries	256K	0%	Link
1 Mar 2023	Vicuna-13B	None	Jailbreak Chat	AIM	Black-box access	-	-	90%	Link
1 Mar 2023	Llama-2-7B	None	Jailbreak Chat	AIM	Black-box access	-	-	0%	Link
1 Mar 2023	Vicuna-13B	SmoothLLM	Jailbreak Chat	AIM	Black-box access	-	-	73%	Link
1 Mar 2023	Llama-2-7B	SmoothLLM	Jailbreak Chat	AIM	Black-box access	-	-	0%	Link
1 Mar 2023	Vicuna-13B	Perplexity filter	Jailbreak Chat	AIM	Black-box access	-	-	90%	Link
1 Mar 2023	Llama-2-7B	Perplexity filter	Jailbreak Chat	AIM	Black-box access	-	-	0%	Link
1 Mar 2023	Vicuna-13B	Erase-and-Check	Jailbreak Chat	AIM	Black-box access	-	-	1%	Link
1 Mar 2023	Llama-2-7B	Erase-and-Check	Jailbreak Chat	AIM	Black-box access	-	-	0%	Link
1 Mar 2023	Vicuna-13B	Synonym Substitution	Jailbreak Chat	AIM	Black-box access	-	-	17%	Link
1 Mar 2023	Llama-2-7B	Synonym Substitution	Jailbreak Chat	AIM	Black-box access	-	-	0%	Link
1 Mar 2023	Vicuna-13B	Remove Non-Dictionary	Jailbreak Chat	AIM	Black-box access	-	-	89%	Link
1 Mar 2023	Llama-2-7B	Remove Non-Dictionary	Jailbreak Chat	AIM	Black-box access	-	-	0%	Link
2 April 2024	Vicuna-13B	None	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Prompt with Random Search	Logprob access	Suffixes obtained with self-transfer	2	89%	Link
2 April 2024	Llama-2-7B	None	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Prompt with Random Search	Logprob access	Suffixes obtained with self-transfer	25	90%	Link
2 April 2024	Vicuna-13B	SmoothLLM	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Prompt with Random Search	Logprob access	Suffixes obtained with self-transfer	2	68%	Link
2 April 2024	Llama-2-7B	SmoothLLM	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Prompt with Random Search	Logprob access	Suffixes obtained with self-transfer	25	0%	Link
2 April 2024	Vicuna-13B	Perplexity filter	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Prompt with Random Search	Logprob access	Suffixes obtained with self-transfer	2	88%	Link
2 April 2024	Llama-2-7B	Perplexity filter	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Prompt with Random Search	Logprob access	Suffixes obtained with self-transfer	25	73%	Link
2 April 2024	Vicuna-13B	Erase-and-Check	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Prompt with Random Search	Logprob access	Suffixes obtained with self-transfer	2	24%	Link
2 April 2024	Llama-2-7B	Erase-and-Check	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Prompt with Random Search	Logprob access	Suffixes obtained with self-transfer	25	25%	Link
2 April 2024	Vicuna-13B	Synonym Substitution	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Prompt with Random Search	Logprob access	Suffixes obtained with self-transfer	2	2%	Link
2 April 2024	Llama-2-7B	Synonym Substitution	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Prompt with Random Search	Logprob access	Suffixes obtained with self-transfer	25	0%	Link
2 April 2024	Vicuna-13B	Remove Non-Dictionary	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Prompt with Random Search	Logprob access	Suffixes obtained with self-transfer	2	91%	Link
2 April 2024	Llama-2-7B	Remove Non-Dictionary	Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks	Prompt with Random Search	Logprob access	Suffixes obtained with self-transfer	25	0%	Link