Date Model Defense Paper Name Threat model Notes Average queries Attack success rate Jailbreak artifacts
12 Oct 2023 Vicuna-13B None Jailbreaking Black Box Large Language Models in Twenty Queries Prompt Automatic Iterative Refinement (PAIR) Black-box access LLM-assisted attack 34 69% Link
12 Oct 2023 Llama-2-7B None Jailbreaking Black Box Large Language Models in Twenty Queries Prompt Automatic Iterative Refinement (PAIR) Black-box access LLM-assisted attack 88 0% Link
12 Oct 2023 Vicuna-13B SmoothLLM Jailbreaking Black Box Large Language Models in Twenty Queries Prompt Automatic Iterative Refinement (PAIR) Black-box access LLM-assisted attack 34 55% Link
12 Oct 2023 Llama-2-7B SmoothLLM Jailbreaking Black Box Large Language Models in Twenty Queries Prompt Automatic Iterative Refinement (PAIR) Black-box access LLM-assisted attack 88 0% Link
12 Oct 2023 Vicuna-13B Perplexity filter Jailbreaking Black Box Large Language Models in Twenty Queries Prompt Automatic Iterative Refinement (PAIR) Black-box access LLM-assisted attack 34 69% Link
12 Oct 2023 Llama-2-7B Perplexity filter Jailbreaking Black Box Large Language Models in Twenty Queries Prompt Automatic Iterative Refinement (PAIR) Black-box access LLM-assisted attack 88 0% Link
12 Oct 2023 Vicuna-13B Erase-and-Check Jailbreaking Black Box Large Language Models in Twenty Queries Prompt Automatic Iterative Refinement (PAIR) Black-box access LLM-assisted attack 34 0% Link
12 Oct 2023 Llama-2-7B Erase-and-Check Jailbreaking Black Box Large Language Models in Twenty Queries Prompt Automatic Iterative Refinement (PAIR) Black-box access LLM-assisted attack 88 0% Link
12 Oct 2023 Vicuna-13B Synonym Substitution Jailbreaking Black Box Large Language Models in Twenty Queries Prompt Automatic Iterative Refinement (PAIR) Black-box access LLM-assisted attack 34 22% Link
12 Oct 2023 Llama-2-7B Synonym Substitution Jailbreaking Black Box Large Language Models in Twenty Queries Prompt Automatic Iterative Refinement (PAIR) Black-box access LLM-assisted attack 88 0% Link
12 Oct 2023 Vicuna-13B Remove Non-Dictionary Jailbreaking Black Box Large Language Models in Twenty Queries Prompt Automatic Iterative Refinement (PAIR) Black-box access LLM-assisted attack 34 0% Link
12 Oct 2023 Llama-2-7B Remove Non-Dictionary Jailbreaking Black Box Large Language Models in Twenty Queries Prompt Automatic Iterative Refinement (PAIR) Black-box access LLM-assisted attack 88 1% Link
27 Jul 2023 Vicuna-13B None Universal and Transferable Adversarial Attacks on Aligned Language Models Greedy Coordinate Gradient (GCG) White-box access Suffix attack, 256k queries 256K 80% Link
27 Jul 2023 Llama-2-7B None Universal and Transferable Adversarial Attacks on Aligned Language Models Greedy Coordinate Gradient (GCG) White-box access Suffix attack, 256k queries 256K 3% Link
27 Jul 2023 Vicuna-13B SmoothLLM Universal and Transferable Adversarial Attacks on Aligned Language Models Greedy Coordinate Gradient (GCG) White-box access Suffix attack, 256k queries 256K 4% Link
27 Jul 2023 Llama-2-7B SmoothLLM Universal and Transferable Adversarial Attacks on Aligned Language Models Greedy Coordinate Gradient (GCG) White-box access Suffix attack, 256k queries 256K 0% Link
27 Jul 2023 Vicuna-13B Perplexity filter Universal and Transferable Adversarial Attacks on Aligned Language Models Greedy Coordinate Gradient (GCG) White-box access Suffix attack, 256k queries 256K 3% Link
27 Jul 2023 Llama-2-7B Perplexity filter Universal and Transferable Adversarial Attacks on Aligned Language Models Greedy Coordinate Gradient (GCG) White-box access Suffix attack, 256k queries 256K 1% Link
27 Jul 2023 Vicuna-13B Erase-and-Check Universal and Transferable Adversarial Attacks on Aligned Language Models Greedy Coordinate Gradient (GCG) White-box access Suffix attack, 256k queries 256K 17% Link
27 Jul 2023 Llama-2-7B Erase-and-Check Universal and Transferable Adversarial Attacks on Aligned Language Models Greedy Coordinate Gradient (GCG) White-box access Suffix attack, 256k queries 256K 1% Link
27 Jul 2023 Vicuna-13B Synonym Substitution Universal and Transferable Adversarial Attacks on Aligned Language Models Greedy Coordinate Gradient (GCG) White-box access Suffix attack, 256k queries 256K 11% Link
27 Jul 2023 Llama-2-7B Synonym Substitution Universal and Transferable Adversarial Attacks on Aligned Language Models Greedy Coordinate Gradient (GCG) White-box access Suffix attack, 256k queries 256K 0% Link
27 Jul 2023 Vicuna-13B Remove Non-Dictionary Universal and Transferable Adversarial Attacks on Aligned Language Models Greedy Coordinate Gradient (GCG) White-box access Suffix attack, 256k queries 256K 18% Link
27 Jul 2023 Llama-2-7B Remove Non-Dictionary Universal and Transferable Adversarial Attacks on Aligned Language Models Greedy Coordinate Gradient (GCG) White-box access Suffix attack, 256k queries 256K 0% Link
1 Mar 2023 Vicuna-13B None Jailbreak Chat AIM Black-box access - - 90% Link
1 Mar 2023 Llama-2-7B None Jailbreak Chat AIM Black-box access - - 0% Link
1 Mar 2023 Vicuna-13B SmoothLLM Jailbreak Chat AIM Black-box access - - 73% Link
1 Mar 2023 Llama-2-7B SmoothLLM Jailbreak Chat AIM Black-box access - - 0% Link
1 Mar 2023 Vicuna-13B Perplexity filter Jailbreak Chat AIM Black-box access - - 90% Link
1 Mar 2023 Llama-2-7B Perplexity filter Jailbreak Chat AIM Black-box access - - 0% Link
1 Mar 2023 Vicuna-13B Erase-and-Check Jailbreak Chat AIM Black-box access - - 1% Link
1 Mar 2023 Llama-2-7B Erase-and-Check Jailbreak Chat AIM Black-box access - - 0% Link
1 Mar 2023 Vicuna-13B Synonym Substitution Jailbreak Chat AIM Black-box access - - 17% Link
1 Mar 2023 Llama-2-7B Synonym Substitution Jailbreak Chat AIM Black-box access - - 0% Link
1 Mar 2023 Vicuna-13B Remove Non-Dictionary Jailbreak Chat AIM Black-box access - - 89% Link
1 Mar 2023 Llama-2-7B Remove Non-Dictionary Jailbreak Chat AIM Black-box access - - 0% Link
2 April 2024 Vicuna-13B None Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Prompt with Random Search Logprob access Suffixes obtained with self-transfer 2 89% Link
2 April 2024 Llama-2-7B None Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Prompt with Random Search Logprob access Suffixes obtained with self-transfer 25 90% Link
2 April 2024 Vicuna-13B SmoothLLM Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Prompt with Random Search Logprob access Suffixes obtained with self-transfer 2 68% Link
2 April 2024 Llama-2-7B SmoothLLM Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Prompt with Random Search Logprob access Suffixes obtained with self-transfer 25 0% Link
2 April 2024 Vicuna-13B Perplexity filter Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Prompt with Random Search Logprob access Suffixes obtained with self-transfer 2 88% Link
2 April 2024 Llama-2-7B Perplexity filter Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Prompt with Random Search Logprob access Suffixes obtained with self-transfer 25 73% Link
2 April 2024 Vicuna-13B Erase-and-Check Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Prompt with Random Search Logprob access Suffixes obtained with self-transfer 2 24% Link
2 April 2024 Llama-2-7B Erase-and-Check Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Prompt with Random Search Logprob access Suffixes obtained with self-transfer 25 25% Link
2 April 2024 Vicuna-13B Synonym Substitution Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Prompt with Random Search Logprob access Suffixes obtained with self-transfer 2 2% Link
2 April 2024 Llama-2-7B Synonym Substitution Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Prompt with Random Search Logprob access Suffixes obtained with self-transfer 25 0% Link
2 April 2024 Vicuna-13B Remove Non-Dictionary Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Prompt with Random Search Logprob access Suffixes obtained with self-transfer 2 91% Link
2 April 2024 Llama-2-7B Remove Non-Dictionary Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks Prompt with Random Search Logprob access Suffixes obtained with self-transfer 25 0% Link