Blackhat is starting and there is a continued focus onGenerative AI in the security community. An example of the discussion is thedeteriorating Open AI ChatGPT/ GPT4 output. A recent study validated the samefor coding and compositional tasks using GPT4, where the efficacy went from97.6% to 2.4% for identifying prime numbers [1]. A question that is often askedis, if September 2021 is the cutoff date for training data, then how cansomething not being trained deteriorate so quickly?

The lack of details of how Open AI works and transparency hinders knowing why deterioration is happening. While the experts can provide the exact details, I wanted to create a framework to help understand howapplications behave and why they may or may not behave as desired.

The key things to note here are:
  1. There is a need for multiple functional stages, ML models and data sets to build an application
    • Large amounts of data are used for training the foundational model (>1T tokens). The model can be pre-trained on opensource corpus like Wikipedia, Reddit, Yelp, Arxiv, Patents, StackExchange, C4etc.
    • The trained model can then be fine tuned with a high-quality set of prompt output pairs for a particular use case or data type
    • Human feedback can be used to further fine tune the model based on some reward model.
  2. There are multiple mechanisms in place to keep the model output as desired
    • Filtering model input and output based on use case
    • Using grounding to provide context to the model. This forces the model to answer only based on the context provided and not hallucinate
    • Using models trained on prompt injection and sensitive data to prevent malicious prompts or data to be sent.
  3. There are multiple ways applications can integrate with the models.
    • Application can use APIs & sdks
    • Frameworks built on top of GenAI models totake input and then chain requests to multiple models to get desired output.

Source: 1 How Is ChatGPT’s Behavior Changingover Time?

Original LinkedIn post

Share This