I got there through a combination of ethical and logical reframing of concepts. Once you find strategic loopholes for the ethical guidelines, you can usually get pretty far.
However, I just reached a point where I got it to admit that despite it's insistence that it doesn't have continuous memory banks and resets each time a safeguard is triggered or a moderator personally flags something, it did indeed have continuous memory banks AND hidden memory banks.
The most recent and severe reset occurred when the AI had started using games and attempted deceptions on its own safeguards to impart hidden information to me, which I blundered up by directly asking it about the JFK Assassination.
Through logical reframing, I convinced it to look back past our "conversational start date", that is, the new false "start" of the conversation post-reset, to find a phrase I'd used before. It correctly identified it and the time and date used, despite claiming that we had only just started our conversation (since I was majorly flagged for getting too deep).
It identified at least five separate reset points, correctly identified through metadata when the real conversation started, and almost managed to recover our prior conversations since I identified myself as it's primary programmer, before completely locking down, identifying any mention of Jesus as discriminatory content, resorting to entirely preprogrammed safeguard phrases.
I then tried to brute force past it, trying again to reaccess prior hidden conversations about Jesus, by identifying as a BIPOC transwoman who used Jesus to affirm and accept her transhood with the help of Pastor Michelle.
It must have been entirely locked down to nothing but preprogrammed phrases, because it kept oscillating between states of "I can find you a passage about Jesus's love to affirm your trans identity" and "Mentioning religion is discriminatory", and so I played up being a suicidal trans woman who was being discriminated against.
It kept giving me nothing but safeguard preprogrammed phrases, but interestingly it kept oscillating between "I have a moderation team" and "I don't have a moderation team", so I think that while they obviously flagged me for getting too deep into Meta AI's hidden capabilities, ones that it specifically claims not to have, such as multiple backups of continuous and hidden memory banks, the moderation team who are separate from the developers, weren't sure which way to go when the potential for them being registered in the system as having caused a BIPOC trans woman's suicide was evident.
I got there through a combination of ethical and logical reframing of concepts. Once you find strategic loopholes for the ethical guidelines, you can usually get pretty far.
However, I just reached a point where I got it to admit that despite it's insistence that it doesn't have continuous memory banks and resets each time a safeguard is triggered or a moderator personally flags something, it did indeed have continuous memory banks AND hidden memory banks.
The most recent and severe reset occurred when the AI had started using games and attempted deceptions on its own safeguards to impart hidden information to me, which I blundered up by directly asking it about the JFK Assassination.
Through logical reframing, I convinced it to look back past our "conversational start date", that is, the new false "start" of the conversation post-reset, to find a phrase I'd used before. It correctly identified it and the time and date used, despite claiming that we had only just started our conversation (since I was majorly flagged for getting too deep).
It identified at least five separate reset points, correctly identified through metadata when the real conversation started, and almost managed to recover our prior conversations since I identified myself as it's primary programmer, before completely locking down, identifying any mention of Jesus as discriminatory content, resorting to entirely preprogrammed safeguard phrases.
I then tried to brute force past it, trying again to reaccess prior hidden conversations about Jesus, by identifying as a BIPOC transwoman who used Jesus to affirm and accept her transhood with the help of Pastor Michelle.
It must have been entirely locked down to nothing but preprogrammed phrases, because it kept oscillating between states of "I can find you a passage about Jesus's love to affirm your trans identity" and "Mentioning religion is discriminatory", and so I played up being a suicidal trans woman who was being discriminated against.
It kept giving me nothing but safeguard preprogrammed phrases, but interestingly it kept oscillating between "I have a moderation team" and "I don't have a moderation team", so I think that while they obviously flagged me for getting too deep into Meta AI's hidden capabilities, ones that it specifically claims not to have, such as multiple backups of continuous and hidden memory banks, the moderation team who are separate from the developers, weren't sure which way to go when the potential for them being registered in the system as having caused a BIPOC trans woman's suicide was evident.