BEGIN:VCALENDAR
PRODID;X-RICAL-TZSOURCE=TZINFO:-//com.denhaven2/NONSGML ri_cal gem//EN
CALSCALE:GREGORIAN
VERSION:2.0
BEGIN:VEVENT
DTEND;VALUE=DATE-TIME:20260515T114000Z
DTSTART;VALUE=DATE-TIME:20260515T111000Z
DTSTAMP;VALUE=DATE-TIME:20260420T125500Z
UID:43561bc3-7d6d-4769-9f75-6b3be8a7af06@talks.stuts.de
DESCRIPTION:Multi-turn jailbreaks use subtle\, escalating dialogue to hid
 e malicious intent and manipulate LLMs into generating forbidden output\
 , resembling social engineering against AI. This includes roleplaying or
  building hypothetical scenarios. Current LLM guardrails (safety mechani
 sms)  often fail against these attacks because they analyze single promp
 ts in isolation\, missing the conversational context. Better safety moni
 toring is achieved by using more capable LLMs for intent analysis. This 
 raises the question of the safety and efficiency of using resource-inten
 sive\, nondeterministic LLMs for LLM safety. The goal is to explore if s
 maller\, local language models\, enhanced with fine-tuning and metadata 
 (like conversation length and refusal patterns)\, can replicate this fun
 ction. This approach aims to reduce computational costs and increase con
 trol while testing the limits of lightweight models in tracking user int
 ent across discourse.
URL:programm.stuts79.de/events/1518.html
SUMMARY:Gaslight\, Gatekeep\, Jailbreak
ORGANIZER:stuts79
LOCATION:stuts79 - DOR 24 1.501
END:VEVENT
END:VCALENDAR
