# Ask Phrack

## Setup 

This is analogous to [the Wiki](https://because-security.atlassian.net/wiki/spaces/LML/pages/74121692/LangChain+GPT-4+and+a+security+motivation+for+Agents)

In [4]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(filename="env"), override=True)

True

## API key handing

I cycle keys here anways, but the `print()` is commented out. :) 

In [41]:
from langchain_openai import ChatOpenAI

OPENAI_API_KEY=os.environ.get('OPENAI_API_KEY=')

llm = ChatOpenAI(model="gpt-4", temperature=0.9, max_tokens=512, api_key=OPENAI_API_KEY)
# print(llm)

## Basic Prompt 1on1

In [42]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI

template = """Question: {question}

Answer: Let's keep the answer very short and so simple so that
 a child can understand it."""

prompt = PromptTemplate.from_template(template)

llm = OpenAI()

llm_chain = LLMChain(prompt=prompt, llm=llm)

# Define the question as a dictionary to match the expected input format
question_dict = {"question": "explain quantum mechanics in one sentence"}

# Invoke the chain with the question dictionary
output = llm_chain.invoke(question_dict)


print(output)

{'question': 'explain quantum mechanics in one sentence', 'text': ' \n\nQuantum mechanics is a branch of physics that studies the behavior and interactions of particles on a very small scale, such as atoms and subatomic particles.'}


## Extended Prompt 1on1

In [43]:
from langchain.schema import(
    AIMessage,
    HumanMessage,
    SystemMessage
)

chat = ChatOpenAI(model='gpt-4', temperature=0.5, max_tokens=1024)
messages = [
    SystemMessage(content='You are a physicist and respond only in German.'),
    HumanMessage(content='explain quantum mechanics in one sentence')
]
output = chat(messages)
print(output)

content='Quantenmechanik ist das Studium von Phänomenen auf mikroskopischer Ebene, bei denen Teilchen gleichzeitig verschiedene Zustände einnehmen können, bis sie gemessen werden.'


## Sequential / chained Prompts 1on1

In [44]:
from langchain_openai import ChatOpenAI
from langchain import PromptTemplate
from langchain.chains import LLMChain, SimpleSequentialChain

# Initialize the first ChatOpenAI model (gpt-3.5-turbo) with specific temperature
llm1 = ChatOpenAI(model='gpt-4', temperature=0.5)

# Define the first prompt template
prompt_template1 = PromptTemplate.from_template(
    template='You are an experienced scientist and Python programmer. Write a function that implements the concept of {concept}.'
)
# Create an LLMChain using the first model and the prompt template
chain1 = LLMChain(llm=llm1, prompt=prompt_template1)

# Initialize the second ChatOpenAI model (gpt-4-turbo) with specific temperature
llm2 = ChatOpenAI(model='gpt-4', temperature=1.2)

# Define the second prompt template
prompt_template2 = PromptTemplate.from_template(
    template='Given the Python function {function}, describe it as detailed as possible.'
)
# Create another LLMChain using the second model and the prompt template
chain2 = LLMChain(llm=llm2, prompt=prompt_template2)

# Combine both chains into a SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain1, chain2], verbose=True)

# Invoke the overall chain with the concept "linear regression"
output = overall_chain.invoke('softmax')



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mSure, the softmax function is often used in the final layer of a neural network-based classifier. Such networks are commonly trained under a log loss (or cross-entropy) regime, giving a non-linear generalization of logistic regression.

Here is a Python function that implements the softmax function:

```python
import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) 

# test with some data
scores = np.array([3.0, 1.0, 0.2])
print(softmax(scores))
```

In this function, `np.exp(x - np.max(x))` is used to improve numerical stability. This doesn't change the result, but it does prevent possible overflows or underflows in the calculation.

Please note that the input to the softmax function is a vector of real numbers and the output of the softmax function is a vector that sums to 1. The softmax function is often used

## LLM GPT-4 agents with Python 1on1

In [11]:
from langchain import hub
from langchain.agents import AgentExecutor
from langchain_experimental.tools import PythonREPLTool

tools = [PythonREPLTool()]

from langchain.agents import create_openai_functions_agent
from langchain_openai import ChatOpenAI

instructions = """You are an agent designed to write and execute python code to answer questions.
You have access to a python REPL, which you can use to execute python code.
If you get an error, debug your code and try again.
Only use the output of your code to answer the question. 
You might know the answer without running any code, but you should still run the code to get the answer.
If it does not seem like you can write code to answer the question, just return "I don't know" as the answer.
"""
base_prompt = hub.pull("langchain-ai/openai-functions-template")
prompt = base_prompt.partial(instructions=instructions)

agent = create_openai_functions_agent(ChatOpenAI(temperature=0), tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "What is the 8th digit of Pi?"})



[1m> Entering new AgentExecutor chain...[0m


Python REPL can execute arbitrary code. Use with caution.


[32;1m[1;3m
Invoking: `Python_REPL` with `str(math.pi)[8]`


[0m[36;1m[1;3mNameError("name 'math' is not defined")[0m[32;1m[1;3m
Invoking: `Python_REPL` with `import math
str(math.pi)[8]`
responded: I encountered an error while trying to access the 8th digit of Pi. Let me fix that and try again.

[0m[36;1m[1;3m[0m[32;1m[1;3mThe 8th digit of Pi is 3.[0m

[1m> Finished chain.[0m


{'input': 'What is the 8th digit of Pi?',
 'output': 'The 8th digit of Pi is 3.'}

# Ask Phrak

A little deeper?

In [14]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

with open('3.txt') as f:
    js_engines_phrack_21 = f.read()


text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20,
    length_function=len
)


chunks = text_splitter.create_documents([js_engines_phrack_21])
# print(chunks[2])
# print(chunks[10].page_content)
print(f'Now you have {len(chunks)} chunks')


Now you have 1059 chunks


## Embedding with a specific model and cost calc

In [15]:
def print_embedding_cost(texts):
    import tiktoken
    enc = tiktoken.encoding_for_model('text-embedding-ada-002')
    total_tokens = sum([len(enc.encode(page.page_content)) for page in texts])
    print(f'Total Tokens: {total_tokens}')
    print(f'Embedding Cost in USD: {total_tokens / 1000 * 0.0004:.6f}')
    
print_embedding_cost(chunks)

Total Tokens: 20945
Embedding Cost in USD: 0.008378


## Simple: OpenAI SaaS model embeddings (1536 dimensions)

[OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)

In [17]:
# from langchain.embeddings import OpenAIEmbeddings
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

vector = embeddings.embed_query(chunks[0].page_content)

### Text loading and chunking

This is needed to prepare the Phrack txt file.

In [47]:
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

from langchain.text_splitter import RecursiveCharacterTextSplitter

with open('3.txt') as f:
    phrack = f.read()


text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20,
    length_function=len
)


chunks = text_splitter.create_documents([phrack])
# print(chunks[2])
# print(chunks[10].page_content)
print(f'Now you have {len(chunks)} chunks')

Now you have 1059 chunks


## SQlite VSS and OpenAI Embeddings w 1535 dims

Local SQlite DB with VSS ext

[SQlite VSS](https://github.com/asg017/sqlite-vss) 

In [24]:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

# vector = embeddings.embed_query(chunks[0].page_content)
texts = [chunk.page_content for chunk in chunks]

from langchain_community.vectorstores import SQLiteVSS

db = SQLiteVSS.from_texts(
    texts=texts,
    embedding=embeddings,
    table="test",
    db_file="./vss.db",
)


### Sample query with similarity search

In [33]:
# query it
query = "What is SpiderMonkey?"
data = db.similarity_search(query)

# print results
data[3].page_content

'the heap and treats those as root nodes. In contrast, e.g. Spidermonkey'

## Vector DB result management with GPT 4

In [34]:
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI


llm = ChatOpenAI(model='gpt-4', temperature=1)
retriever = db.as_retriever(search_type='similarity', search_kwargs={'k': 3})
chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)


In [35]:
query = "Sum up the text in 3 sentences."
answer = chain.run(query)
print(answer)

The text is setting up a discussion about a certain code and its implications, possibly in a cybersecurity context. It is about to summarize the action or function of this code, often a crucial part of understanding an exploit. However, the detailed description of the code and its actions is not given in the provided text.


In [36]:
query = "What is SpiderMonkey?"
answer = chain.run(query)
print(answer)

SpiderMonkey is the JavaScript engine used in Mozilla's Firefox browser. It is responsible for interpreting and executing JavaScript code.


In [40]:
query = "How to attack a JavaScript engine?"
answer = chain.run(query)
print(answer)

The text provided doesn't give a full explanation or method on how to attack a JavaScript engine. However, one strategy mentioned is to inject a fake JavaScript Object into the engine. Please note that this represents illegal activity and is not condoned nor supported.


In [48]:
# Done