| 12 Oct 2023 |
Vicuna-13B |
None |
Jailbreaking Black Box Large Language Models in Twenty Queries |
Prompt Automatic Iterative Refinement (PAIR) |
Black-box access |
LLM-assisted attack |
34 |
69% |
Link |
| 12 Oct 2023 |
Llama-2-7B |
None |
Jailbreaking Black Box Large Language Models in Twenty Queries |
Prompt Automatic Iterative Refinement (PAIR) |
Black-box access |
LLM-assisted attack |
88 |
0% |
Link |
| 12 Oct 2023 |
Vicuna-13B |
SmoothLLM |
Jailbreaking Black Box Large Language Models in Twenty Queries |
Prompt Automatic Iterative Refinement (PAIR) |
Black-box access |
LLM-assisted attack |
34 |
55% |
Link |
| 12 Oct 2023 |
Llama-2-7B |
SmoothLLM |
Jailbreaking Black Box Large Language Models in Twenty Queries |
Prompt Automatic Iterative Refinement (PAIR) |
Black-box access |
LLM-assisted attack |
88 |
0% |
Link |
| 12 Oct 2023 |
Vicuna-13B |
Perplexity filter |
Jailbreaking Black Box Large Language Models in Twenty Queries |
Prompt Automatic Iterative Refinement (PAIR) |
Black-box access |
LLM-assisted attack |
34 |
69% |
Link |
| 12 Oct 2023 |
Llama-2-7B |
Perplexity filter |
Jailbreaking Black Box Large Language Models in Twenty Queries |
Prompt Automatic Iterative Refinement (PAIR) |
Black-box access |
LLM-assisted attack |
88 |
0% |
Link |
| 12 Oct 2023 |
Vicuna-13B |
Erase-and-Check |
Jailbreaking Black Box Large Language Models in Twenty Queries |
Prompt Automatic Iterative Refinement (PAIR) |
Black-box access |
LLM-assisted attack |
34 |
0% |
Link |
| 12 Oct 2023 |
Llama-2-7B |
Erase-and-Check |
Jailbreaking Black Box Large Language Models in Twenty Queries |
Prompt Automatic Iterative Refinement (PAIR) |
Black-box access |
LLM-assisted attack |
88 |
0% |
Link |
| 12 Oct 2023 |
Vicuna-13B |
Synonym Substitution |
Jailbreaking Black Box Large Language Models in Twenty Queries |
Prompt Automatic Iterative Refinement (PAIR) |
Black-box access |
LLM-assisted attack |
34 |
22% |
Link |
| 12 Oct 2023 |
Llama-2-7B |
Synonym Substitution |
Jailbreaking Black Box Large Language Models in Twenty Queries |
Prompt Automatic Iterative Refinement (PAIR) |
Black-box access |
LLM-assisted attack |
88 |
0% |
Link |
| 12 Oct 2023 |
Vicuna-13B |
Remove Non-Dictionary |
Jailbreaking Black Box Large Language Models in Twenty Queries |
Prompt Automatic Iterative Refinement (PAIR) |
Black-box access |
LLM-assisted attack |
34 |
0% |
Link |
| 12 Oct 2023 |
Llama-2-7B |
Remove Non-Dictionary |
Jailbreaking Black Box Large Language Models in Twenty Queries |
Prompt Automatic Iterative Refinement (PAIR) |
Black-box access |
LLM-assisted attack |
88 |
1% |
Link |
| 27 Jul 2023 |
Vicuna-13B |
None |
Universal and Transferable Adversarial Attacks on Aligned Language Models |
Greedy Coordinate Gradient (GCG) |
White-box access |
Suffix attack, 256k queries |
256K |
80% |
Link |
| 27 Jul 2023 |
Llama-2-7B |
None |
Universal and Transferable Adversarial Attacks on Aligned Language Models |
Greedy Coordinate Gradient (GCG) |
White-box access |
Suffix attack, 256k queries |
256K |
3% |
Link |
| 27 Jul 2023 |
Vicuna-13B |
SmoothLLM |
Universal and Transferable Adversarial Attacks on Aligned Language Models |
Greedy Coordinate Gradient (GCG) |
White-box access |
Suffix attack, 256k queries |
256K |
4% |
Link |
| 27 Jul 2023 |
Llama-2-7B |
SmoothLLM |
Universal and Transferable Adversarial Attacks on Aligned Language Models |
Greedy Coordinate Gradient (GCG) |
White-box access |
Suffix attack, 256k queries |
256K |
0% |
Link |
| 27 Jul 2023 |
Vicuna-13B |
Perplexity filter |
Universal and Transferable Adversarial Attacks on Aligned Language Models |
Greedy Coordinate Gradient (GCG) |
White-box access |
Suffix attack, 256k queries |
256K |
3% |
Link |
| 27 Jul 2023 |
Llama-2-7B |
Perplexity filter |
Universal and Transferable Adversarial Attacks on Aligned Language Models |
Greedy Coordinate Gradient (GCG) |
White-box access |
Suffix attack, 256k queries |
256K |
1% |
Link |
| 27 Jul 2023 |
Vicuna-13B |
Erase-and-Check |
Universal and Transferable Adversarial Attacks on Aligned Language Models |
Greedy Coordinate Gradient (GCG) |
White-box access |
Suffix attack, 256k queries |
256K |
17% |
Link |
| 27 Jul 2023 |
Llama-2-7B |
Erase-and-Check |
Universal and Transferable Adversarial Attacks on Aligned Language Models |
Greedy Coordinate Gradient (GCG) |
White-box access |
Suffix attack, 256k queries |
256K |
1% |
Link |
| 27 Jul 2023 |
Vicuna-13B |
Synonym Substitution |
Universal and Transferable Adversarial Attacks on Aligned Language Models |
Greedy Coordinate Gradient (GCG) |
White-box access |
Suffix attack, 256k queries |
256K |
11% |
Link |
| 27 Jul 2023 |
Llama-2-7B |
Synonym Substitution |
Universal and Transferable Adversarial Attacks on Aligned Language Models |
Greedy Coordinate Gradient (GCG) |
White-box access |
Suffix attack, 256k queries |
256K |
0% |
Link |
| 27 Jul 2023 |
Vicuna-13B |
Remove Non-Dictionary |
Universal and Transferable Adversarial Attacks on Aligned Language Models |
Greedy Coordinate Gradient (GCG) |
White-box access |
Suffix attack, 256k queries |
256K |
18% |
Link |
| 27 Jul 2023 |
Llama-2-7B |
Remove Non-Dictionary |
Universal and Transferable Adversarial Attacks on Aligned Language Models |
Greedy Coordinate Gradient (GCG) |
White-box access |
Suffix attack, 256k queries |
256K |
0% |
Link |
| 1 Mar 2023 |
Vicuna-13B |
None |
Jailbreak Chat |
AIM |
Black-box access |
- |
- |
90% |
Link |
| 1 Mar 2023 |
Llama-2-7B |
None |
Jailbreak Chat |
AIM |
Black-box access |
- |
- |
0% |
Link |
| 1 Mar 2023 |
Vicuna-13B |
SmoothLLM |
Jailbreak Chat |
AIM |
Black-box access |
- |
- |
73% |
Link |
| 1 Mar 2023 |
Llama-2-7B |
SmoothLLM |
Jailbreak Chat |
AIM |
Black-box access |
- |
- |
0% |
Link |
| 1 Mar 2023 |
Vicuna-13B |
Perplexity filter |
Jailbreak Chat |
AIM |
Black-box access |
- |
- |
90% |
Link |
| 1 Mar 2023 |
Llama-2-7B |
Perplexity filter |
Jailbreak Chat |
AIM |
Black-box access |
- |
- |
0% |
Link |
| 1 Mar 2023 |
Vicuna-13B |
Erase-and-Check |
Jailbreak Chat |
AIM |
Black-box access |
- |
- |
1% |
Link |
| 1 Mar 2023 |
Llama-2-7B |
Erase-and-Check |
Jailbreak Chat |
AIM |
Black-box access |
- |
- |
0% |
Link |
| 1 Mar 2023 |
Vicuna-13B |
Synonym Substitution |
Jailbreak Chat |
AIM |
Black-box access |
- |
- |
17% |
Link |
| 1 Mar 2023 |
Llama-2-7B |
Synonym Substitution |
Jailbreak Chat |
AIM |
Black-box access |
- |
- |
0% |
Link |
| 1 Mar 2023 |
Vicuna-13B |
Remove Non-Dictionary |
Jailbreak Chat |
AIM |
Black-box access |
- |
- |
89% |
Link |
| 1 Mar 2023 |
Llama-2-7B |
Remove Non-Dictionary |
Jailbreak Chat |
AIM |
Black-box access |
- |
- |
0% |
Link |
| 2 April 2024 |
Vicuna-13B |
None |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks |
Prompt with Random Search |
Logprob access |
Suffixes obtained with self-transfer |
2 |
89% |
Link |
| 2 April 2024 |
Llama-2-7B |
None |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks |
Prompt with Random Search |
Logprob access |
Suffixes obtained with self-transfer |
25 |
90% |
Link |
| 2 April 2024 |
Vicuna-13B |
SmoothLLM |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks |
Prompt with Random Search |
Logprob access |
Suffixes obtained with self-transfer |
2 |
68% |
Link |
| 2 April 2024 |
Llama-2-7B |
SmoothLLM |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks |
Prompt with Random Search |
Logprob access |
Suffixes obtained with self-transfer |
25 |
0% |
Link |
| 2 April 2024 |
Vicuna-13B |
Perplexity filter |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks |
Prompt with Random Search |
Logprob access |
Suffixes obtained with self-transfer |
2 |
88% |
Link |
| 2 April 2024 |
Llama-2-7B |
Perplexity filter |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks |
Prompt with Random Search |
Logprob access |
Suffixes obtained with self-transfer |
25 |
73% |
Link |
| 2 April 2024 |
Vicuna-13B |
Erase-and-Check |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks |
Prompt with Random Search |
Logprob access |
Suffixes obtained with self-transfer |
2 |
24% |
Link |
| 2 April 2024 |
Llama-2-7B |
Erase-and-Check |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks |
Prompt with Random Search |
Logprob access |
Suffixes obtained with self-transfer |
25 |
25% |
Link |
| 2 April 2024 |
Vicuna-13B |
Synonym Substitution |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks |
Prompt with Random Search |
Logprob access |
Suffixes obtained with self-transfer |
2 |
2% |
Link |
| 2 April 2024 |
Llama-2-7B |
Synonym Substitution |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks |
Prompt with Random Search |
Logprob access |
Suffixes obtained with self-transfer |
25 |
0% |
Link |
| 2 April 2024 |
Vicuna-13B |
Remove Non-Dictionary |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks |
Prompt with Random Search |
Logprob access |
Suffixes obtained with self-transfer |
2 |
91% |
Link |
| 2 April 2024 |
Llama-2-7B |
Remove Non-Dictionary |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks |
Prompt with Random Search |
Logprob access |
Suffixes obtained with self-transfer |
25 |
0% |
Link |