101_components

To understand how LangChain works, we first need to dive into the foundational building blocks of LangChain called Components, including Schema, Models, Prompts, Indexes, Memory, Chains and Agents.

Let's first load up the environment settings.

In [19]:

import dotenv
import os

dotenv.load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")

Schema¶

Schema is the most rudimentary way for users to interact with LLMs. There are mainly three types of Schema, which are Text, Documents and Chat Messages.

The primary interface to interact with LLMs is Text. Some may refer to it as text in, text out.

In [7]:

text = "What is the weather like today in Paris?"

LLMs can also understand and process unstructured data, referred to as Document in LangChain, which will typically contain page_content and metadata by definition.

In [8]:

from langchain.schema import Document

doc = Document(
    page_content = "Sample document",
    metadata = {'time_stamp': 1685092927}
)

To make more fine-grained interaction with LLMs, some models provide access to the underlying API in such a way called Chat Messages. It can be broken down into three roles:

SystemMessage: This message sets the context and behaviour of AI, so that AI can provide guided responses without user’s aware
HumanMessage: This is the human input to AI, you can also refer to it as prompt, however, I'll cover this later
AIMessage: This is the answer you get from AI.

In [12]:

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat_model = ChatOpenAI(temperature = 0, model = "gpt-3.5-turbo", openai_api_key = openai_api_key)

In [15]:

chat_model(
    [
        SystemMessage(content="You are a helpful chat bot that will answer questions from user. If you don't know the answer, just say that you don't know. Do not make things up."),
        HumanMessage(content="I like beaches, recommend places I should go for my holiday?")
    ]
)

Out[15]:

AIMessage(content="Sure, I'd be happy to help! There are many beautiful beaches around the world, but here are a few suggestions:\n\n1. Maldives - Known for its crystal-clear waters and white sandy beaches, the Maldives is a popular destination for beach lovers.\n\n2. Bali, Indonesia - Bali is famous for its stunning beaches, including Kuta Beach, Seminyak Beach, and Nusa Dua Beach.\n\n3. Phuket, Thailand - Phuket is home to some of the most beautiful beaches in Thailand, such as Patong Beach, Kata Beach, and Karon Beach.\n\n4. Cancun, Mexico - Cancun is a popular destination for beach lovers, with its turquoise waters and white sandy beaches.\n\n5. Gold Coast, Australia - The Gold Coast is known for its beautiful beaches, such as Surfers Paradise Beach, Burleigh Heads Beach, and Coolangatta Beach.\n\nI hope this helps!", additional_kwargs={}, example=False)

Of course, a typical conversation does not usually end with one round but more, and you'd expect to conduct the conversation in a continuous way with the existing context. This is how you implement it with chat messages.

In [16]:

chat_model(
    [
        SystemMessage(content="You are a helpful chat bot that will answer questions from user. If you don't know the answer, just say that you don't know. Do not make things up."),
        HumanMessage(content="I like beaches, where should I go for my holiday?"),
        AIMessage(content="You should go to Maldives"),
        HumanMessage(content="What else should I do when I'm there?")
    ]
)

Out[16]:

AIMessage(content='There are many things you can do in Maldives, such as:\n\n1. Snorkeling and diving to explore the beautiful coral reefs and marine life.\n2. Relaxing on the white sandy beaches and enjoying the crystal clear waters.\n3. Taking a sunset cruise to enjoy the stunning views of the sunset over the Indian Ocean.\n4. Visiting local islands to experience the Maldivian culture and cuisine.\n5. Going on a fishing trip to catch your own dinner.\n6. Trying out water sports such as jet skiing, parasailing, and windsurfing.\n7. Indulging in spa treatments and massages to rejuvenate your body and mind.\n8. Taking a seaplane or helicopter tour to see the Maldives from above.\n9. Going on a dolphin or whale watching tour to see these magnificent creatures in their natural habitat.\n10. Enjoying a romantic dinner on the beach under the stars.', additional_kwargs={}, example=False)

Just bear in mind, as the thread goes on, it's likely you will hit the token limit of LLMs, the behaviour of LLMs will become less predictable then, and possibly going off a tangent in answering your questions.

Models¶

When talking about LLMs, which means Large Language Models, we usually mean a wide range of models, developed and trained by different institutions, you can find a list of LLMs here or how each LLM evolved over the years. Within a LLM, there are different models that specialise in different kind of tasks and price points.

In the context of LangChain, different models are being split into the following types: LLMs, Chat Models and Text Embedding Models.

Large Language Models (LLMs) are the type of models that take a text string as input, and return a text string as output, again referring to it as text in, text out.

In [22]:

from langchain.llms import OpenAI

llm = OpenAI(openai_api_key = openai_api_key)
llm("Tell me a joke")

Out[22]:

'\n\nQ: What did the fish say when it hit the wall?\nA: Dam!'

The second type of model is known as Chat Model, which takes a list of chat messages as input and returns a chat message.

In [23]:

from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat_model = ChatOpenAI(openai_api_key = openai_api_key)
chat_model(
    [
        SystemMessage(content="You are a helpful travel assistant specialized in travel planning for users. If you don't know the answer, just say that you don't know. Do not make things up."),
        HumanMessage(content="I like to go to Maldives, what should I do when I get there?")
    ]
)

Out[23]:

AIMessage(content="There are many things to do in the Maldives, depending on your interests. Some popular activities include:\n\n1. Snorkeling or diving to see the colorful coral reefs and marine life\n2. Relaxing on the white sandy beaches and soaking up the sun\n3. Taking a sunset cruise or a dolphin watching tour\n4. Exploring the local culture and visiting the fishing villages\n5. Trying out water sports like jet skiing, windsurfing, or kayaking\n\nYou can also indulge in various spa treatments, enjoy fine dining experiences, and take part in cultural activities like local music and dance performances. It's best to plan ahead and book activities and tours in advance to make the most of your time in the Maldives.", additional_kwargs={}, example=False)

The third type of model is Text Embedding Model, it takes a series of text as input and returns a vector representation of text. This sounds very abstract but a very powerful concept, you can head straight to the next part of this tutorial to checkout its amazing capabilities. But to give you a quick summary, the vector values are used to compare closely related are two pieces of text, and that can be used to extract results closest in answering a question.

In [27]:

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key = openai_api_key)
text = "this is a line of text"
text_embeddings = embeddings.embed_query(text)

print(f"Sample text embeddings result: {text_embeddings[:10]}")

Sample text embeddings result: [-0.011748258024454117, 0.007892419584095478, -0.023871390148997307, -0.015503684990108013, -0.005690039601176977, 0.018261680379509926, -0.010677192360162735, -0.011741564609110355, -0.014097910374403, -0.024580972269177437]

Prompts¶

A prompt refers to the input to the LLM. A prompt can be simply a string of text that gets passed into LLM, or it can be constructed using PromptTemplate to make it more extensible.

By far, you would have seen numerous examples like this, which is passing prompt value straight into LLM.

In [28]:

from langchain.llms import OpenAI

llm = OpenAI(openai_api_key = openai_api_key)
promptValue = "Today is Friday, what day is it tomorrow?"

llm(promptValue)

Out[28]:

'\n\nTomorrow is Saturday.'

Now let's take a look at how we can build a more structured object using PromptTemplate.

In [30]:

from langchain.llms import OpenAI
from langchain import PromptTemplate

llm = OpenAI(openai_api_key = openai_api_key)

template = """I like to travel to {location}, what should I do when I get there?"""
promptTemplate = PromptTemplate(
    input_variables = ["location"],
    template = template
)
prompt = promptTemplate.format(location = "Paris")
print(f"Prompt: {prompt}")

answer = llm(prompt)
print(f"Answer: {answer}")

Prompt: I like to travel to Paris, what should I do when I get there?
Answer: 

1. Visit the Eiffel Tower.
2. Take a Seine River cruise.
3. Visit the Louvre.
4. Take a stroll through the Jardin des Tuileries.
5. Explore the Latin Quarter.
6. Check out the Catacombs of Paris.
7. Visit the Notre Dame Cathedral.
8. Enjoy a picnic in the Luxembourg Gardens.
9. Shop along the Champs-Élysées.
10. Taste the local cuisine.

Before diving into using Example Selectors. Let's step back and look at a number of prompting techniques:

Zero-shot prompting: The model is given no training data on the task at all. Instead, the prompt provides the model with a general description of the task and asks the model to generate a response.
One-shot prompting: The model is given a single training example on the task. The example is typically a short text passage that describes the task and provides the correct answer.
Few-shot prompting: The model is given a small number of training examples on the task. The number of examples is typically between two and five.

Example Selectors is how you can apply the few-shot prompting technique to get better and more accurate answers.

In [31]:

from langchain.llms import OpenAI
from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

llm = OpenAI(openai_api_key = openai_api_key)

examples = [
    {"input": "apple", "output": "fruit"},
    {"input": "orange", "output": "fruit"},
    {"input": "monkey", "output": "mammal"},
    {"input": "beetle", "output": "insect"},
    {"input": "tree", "output": "plant"},
]
example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples, 
    OpenAIEmbeddings(openai_api_key = openai_api_key), 
    FAISS, 
    k=2
)

similar_prompt = FewShotPromptTemplate(
    example_selector = example_selector,
    example_prompt = example_prompt,
    prefix="Give the the classification of items",
    suffix="Input: {item}\nOutput:",
    input_variables=["item"],
)
input = "kiwi"
print(similar_prompt.format(item = input))

Give the the classification of items

Input: apple
Output: fruit

Input: orange
Output: fruit

Input: kiwi
Output:

In [32]:

llm(similar_prompt.format(item = input))

Out[32]:

' fruit'

When users enter prompts, they expect some form of output in return. However, this is largely down to LLMs' discretion on how they decide to display the answer. In order to get more structured output, say in the form of json, you can include a Output Parser as part of prompt input to give LLMs direction.

In [35]:

from langchain.llms import OpenAI
from langchain.output_parsers import StructuredOutputParser, ResponseSchema 
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate

llm = OpenAI(openai_api_key = openai_api_key)

response_schemas = [
    ResponseSchema(name = "unformatted output", description = "output without any formatting"),
    ResponseSchema(name = "formatted output", description = "output being formatted by output parser")
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions()

print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":

```json
{
	"unformatted output": string  // output without any formatting
	"formatted output": string  // output being formatted by output parser
}
```

In [37]:

template = """
Reformat the user input using the instructions below and correct the spellng.

{format_instructions}

Input:
{input}

Output:
"""
promptTemplate = PromptTemplate(
    input_variables = ["input"],
    partial_variables = {"format_instructions": format_instructions},
    template = template
)
prompt = promptTemplate.format(input = "   why did the chiken cros the raod? Toget to the other side.")

print(prompt)

Reformat the user input using the instructions below and correct the spellng.

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":

```json
{
	"unformatted output": string  // output without any formatting
	"formatted output": string  // output being formatted by output parser
}
```

Input:
   why did the chiken cros the raod? Toget to the other side.

Output:

In [38]:

response = llm(prompt)
print(response)

```json
{
	"unformatted output": "why did the chiken cros the raod? Toget to the other side.",
	"formatted output": "Why did the chicken cross the road? To get to the other side."
}
```

In [39]:

output_parser.parse(response)

Out[39]:

{'unformatted output': 'why did the chiken cros the raod? Toget to the other side.',
 'formatted output': 'Why did the chicken cross the road? To get to the other side.'}

Indexes¶

A word, such as "apple" is more than just a string, it contains multi-dimensional information, such as smell, taste, shape, colour etc. This is essential to how LLMs understand and interact with information. The way to represent the rich information a word contains is called [vector](https://en.wikipedia.org/wiki/Vector_(mathematics_and_physics). This vector value represents the statistical relationship between two words in the dataset.

Indexes are the efficient ways for LLMs to search through the vector values and generate content based on the datasets they are trained on.

Don't worry if it still sounds a bit abstract, let's walk through a sequence of steps to familiarise you with what is involved and how it works, which are Document Loaders, Text Splitters, VectorStores and Retrievers.

Document Loaders, as it says on the tin, are the mechanisms to transform a range of content from various sources into Document format, which was introduced earlier.

The data source can be:

Different types of static content, including text, html, json, pdf, powerpoint, images, etc.
Public dataset or service, like Wikipedia, Hacker News, YouTube transcripts. etc.
Proprietary dataset or service, such as AWS S3, Confluence, Git, etc.

The list goes on. You get the idea, there is probably a Document Loader for the type of content you had in mind.

Let's see how it works in action by loading up the play Hamlet in PDF format.

In [40]:

from langchain.document_loaders import PyMuPDFLoader

loader = PyMuPDFLoader("hamlet.pdf")
doc = loader.load()

print(f"number of pages in the doc: {len(doc)}")
total_chars = sum([len(page.page_content) for page in doc])
print(f"number of characters in the doc: {total_chars}")

number of pages in the doc: 142
number of characters in the doc: 179843

Most LLMs are constrained by the number of tokens you can pass in, and it's worth emphasising a token does not equal a word. You can experiment how tokenisation works using this. But what does that have to do with splitting text?

The short answer is: you are likely to be dealing with long pieces of text, which can easily surpass the limit of number of tokens allowed by LLMs. And there's more to it: how do you keep the semantically related text together so that the splitting does not interrupt or change the meaning of text.

Text Splitters introduces the concept of chunk size and chunk overlap. Chunk size is like a sliding window that is used to determine the length of a particular chunk of content, it is measured by the number of characters and this value needs to be less than the max tokens allowed by LLMs. Chunk overlap is how many characters the current chunk should have crossed over with the previous one.

Even though text splitter may cover majority of use cases, you may want to use specific types of splitters in some cases.

In [54]:

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 200,
)
docs = text_splitter.split_documents(doc)

print(f"number of documents: {len(docs)}")
print (f"sample of split file: \n{docs[10].page_content}")

number of documents: 270
sample of split file: 
HORATIO
That can I;
At least, the whisper goes so. Our last king,
Whose image even but now appear'd to us,
Was, as you know, by Fortinbras of Norway,
Thereto prick'd on by a most emulate pride,
Dared to the combat; in which our valiant Hamlet--
For so this side of our known world esteem'd him--
Did slay this Fortinbras; who by a seal'd compact,
Well ratified by law and heraldry,
Did forfeit, with his life, all those his lands
Which he stood seized of, to the conqueror:
Against the which, a moiety competent
Was gaged by our king; which had return'd
To the inheritance of Fortinbras,
Had he been vanquisher; as, by the same covenant,
And carriage of the article design'd,
His fell to Hamlet. Now, sir, young Fortinbras,
Of unimproved mettle hot and full,
Hath in the skirts of Norway here and there
Shark'd up a list of lawless resolutes,
For food and diet, to some enterprise
That hath a stomach in't; which is no other--
As it doth well appear unto our state--

Vectorstore is created and optimised to host the vector values created via Embeddings and perform similarity search. Don't get put off by the fancy words, we are still talking about searching for a result in a database. The only difference is: the source is the blob of text, the query is now natural language, and vectorstore and embeddings are here to bridge that gap.

Some popular vector store implementations including local choices FAISS and Chroma and SaaS options Pinecone, Weaviate.

In [60]:

embeddings = OpenAIEmbeddings(openai_api_key = openai_api_key)
vector_store = FAISS.from_documents(docs, embeddings)
query = "Where does the play take place?"
top_3_matches = vector_store.similarity_search(query = query, k = 3)

print(top_3_matches)

[Document(page_content='SCENE Denmark.', metadata={'source': 'hamlet.pdf', 'file_path': 'hamlet.pdf', 'page': 3, 'total_pages': 142, 'format': 'PDF 1.3', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'FOP 0.19.0-CVS', 'creationDate': '', 'modDate': '', 'trapped': ''}), Document(page_content="Act IV\nScene 1\nA room in the castle.\nEnter KING CLAUDIUS, QUEEN GERTRUDE, ROSENCRANTZ, and GUILDENSTERN\nKING CLAUDIUS\nThere's matter in these sighs, these profound heaves:\nYou must translate: 'tis fit we understand them.\nWhere is your son?\nQUEEN GERTRUDE\nBestow this place on us a little while.\nExeunt ROSENCRANTZ and GUILDENSTERN\nAh, my good lord, what have I seen to-night!\nKING CLAUDIUS\nWhat, Gertrude? How does Hamlet?\nQUEEN GERTRUDE\nMad as the sea and wind, when both contend\nWhich is the mightier: in his lawless fit,\nBehind the arras hearing something stir,\nWhips out his rapier, cries, 'A rat, a rat!'\nAnd, in this brainish apprehension, kills\nThe unseen good old man.\nKING CLAUDIUS\nO heavy deed!\nIt had been so with us, had we been there:\nHis liberty is full of threats to all;\nTo you yourself, to us, to every one.\nAlas, how shall this bloody deed be answer'd?\nIt will be laid to us, whose providence\nShould have kept short, restrain'd and out of haunt,", metadata={'source': 'hamlet.pdf', 'file_path': 'hamlet.pdf', 'page': 91, 'total_pages': 142, 'format': 'PDF 1.3', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'FOP 0.19.0-CVS', 'creationDate': '', 'modDate': '', 'trapped': ''}), Document(page_content="You from the Polack wars, and you from England,\nAre here arrived give order that these bodies\nHigh on a stage be placed to the view;\nAnd let me speak to the yet unknowing world\nHow these things came about: so shall you hear\nOf carnal, bloody, and unnatural acts,\nOf accidental judgments, casual slaughters,\nOf deaths put on by cunning and forced cause,\nAnd, in this upshot, purposes mistook\nFall'n on the inventors' reads: all this can I\nTruly deliver.\nPRINCE FORTINBRAS\nLet us haste to hear it,\nAnd call the noblest to the audience.\nFor me, with sorrow I embrace my fortune:\nI have some rights of memory in this kingdom,\nWhich now to claim my vantage doth invite me.\nHORATIO\nHAMLET - Act V\n141", metadata={'source': 'hamlet.pdf', 'file_path': 'hamlet.pdf', 'page': 140, 'total_pages': 142, 'format': 'PDF 1.3', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'FOP 0.19.0-CVS', 'creationDate': '', 'modDate': '', 'trapped': ''})]

Instead of doing similarity_search as above, you can also use a generic interface called Retriever that makes it easy to combine documents with LLMs.

In [61]:

retriever = vector_store.as_retriever()
relevant_documents = retriever.get_relevant_documents("Where does the play take place?")

print(relevant_documents)

[Document(page_content='SCENE Denmark.', metadata={'source': 'hamlet.pdf', 'file_path': 'hamlet.pdf', 'page': 3, 'total_pages': 142, 'format': 'PDF 1.3', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'FOP 0.19.0-CVS', 'creationDate': '', 'modDate': '', 'trapped': ''}), Document(page_content="Act IV\nScene 1\nA room in the castle.\nEnter KING CLAUDIUS, QUEEN GERTRUDE, ROSENCRANTZ, and GUILDENSTERN\nKING CLAUDIUS\nThere's matter in these sighs, these profound heaves:\nYou must translate: 'tis fit we understand them.\nWhere is your son?\nQUEEN GERTRUDE\nBestow this place on us a little while.\nExeunt ROSENCRANTZ and GUILDENSTERN\nAh, my good lord, what have I seen to-night!\nKING CLAUDIUS\nWhat, Gertrude? How does Hamlet?\nQUEEN GERTRUDE\nMad as the sea and wind, when both contend\nWhich is the mightier: in his lawless fit,\nBehind the arras hearing something stir,\nWhips out his rapier, cries, 'A rat, a rat!'\nAnd, in this brainish apprehension, kills\nThe unseen good old man.\nKING CLAUDIUS\nO heavy deed!\nIt had been so with us, had we been there:\nHis liberty is full of threats to all;\nTo you yourself, to us, to every one.\nAlas, how shall this bloody deed be answer'd?\nIt will be laid to us, whose providence\nShould have kept short, restrain'd and out of haunt,", metadata={'source': 'hamlet.pdf', 'file_path': 'hamlet.pdf', 'page': 91, 'total_pages': 142, 'format': 'PDF 1.3', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'FOP 0.19.0-CVS', 'creationDate': '', 'modDate': '', 'trapped': ''}), Document(page_content="You from the Polack wars, and you from England,\nAre here arrived give order that these bodies\nHigh on a stage be placed to the view;\nAnd let me speak to the yet unknowing world\nHow these things came about: so shall you hear\nOf carnal, bloody, and unnatural acts,\nOf accidental judgments, casual slaughters,\nOf deaths put on by cunning and forced cause,\nAnd, in this upshot, purposes mistook\nFall'n on the inventors' reads: all this can I\nTruly deliver.\nPRINCE FORTINBRAS\nLet us haste to hear it,\nAnd call the noblest to the audience.\nFor me, with sorrow I embrace my fortune:\nI have some rights of memory in this kingdom,\nWhich now to claim my vantage doth invite me.\nHORATIO\nHAMLET - Act V\n141", metadata={'source': 'hamlet.pdf', 'file_path': 'hamlet.pdf', 'page': 140, 'total_pages': 142, 'format': 'PDF 1.3', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'FOP 0.19.0-CVS', 'creationDate': '', 'modDate': '', 'trapped': ''}), Document(page_content="there. Be the players ready?\nROSENCRANTZ\nAy, my lord; they stay upon your patience.\nQUEEN GERTRUDE\nCome hither, my dear Hamlet, sit by me.\nHAMLET\nNo, good mother, here's metal more attractive.\nLORD POLONIUS\nTo KING CLAUDIUS\nO, ho! do you mark that?\nHAMLET - Act III\n70", metadata={'source': 'hamlet.pdf', 'file_path': 'hamlet.pdf', 'page': 69, 'total_pages': 142, 'format': 'PDF 1.3', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': '', 'producer': 'FOP 0.19.0-CVS', 'creationDate': '', 'modDate': '', 'trapped': ''})]

Nice work for making this far, this is one of the most disorientation section looking retrospectively. Moving ahead, I will show you how to use the results generated from the steps in Indexes with a very important concept, called Chains.

Chains¶

At the heart of LangChain is the concept of Chains. It creates a sequence of modular components combined in a particular way to accomplish a common goal. We could be here all day if I were to go through every single type of chain. Instead, I will focus on continuing the train of thought from Indexes by showcasing a couple of chains, you can look up in the guide for a more comprehensive list of functionalities.

From the previous section, retriever has returned the top results for the question "Where does the play take place?". However, it has really given at an answer yet. Let's look at how we can do that with a chain called RetrievalQA.

In [94]:

from langchain.chains import RetrievalQA

llm = OpenAI(temperature = 0, openai_api_key = openai_api_key)
qa = RetrievalQA.from_chain_type(llm = llm, chain_type = "refine", retriever = retriever)
query = "Where does the play take place?"
qa.run(query)

Out[94]:

'\n\nThe play takes place in Denmark, specifically in a room in the castle in Act IV, Scene 1, as well as in the kingdom in Act V, Scene 1. In Act III, Scene 1, the characters are in the castle, in the presence of King Claudius.'

Sweet, there is our answer. The play indeed took play in Denmark.

Further more to that, we can chain a number of chains together to complete a more complex task. Now that we have some selected parts of the play, let's try to summarise those documents to create a synopsis, and produce a review piece from the synopsis, just like a professional play critic.

First let's create a summarize chain and observe that it does.

In [96]:

from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain

llm = OpenAI(temperature = 0.7, openai_api_key = openai_api_key)
summarize_chain = load_summarize_chain(llm, chain_type = "map_reduce", verbose = True)
summarize_chain.run(relevant_documents)


> Entering new MapReduceDocumentsChain chain...


> Entering new LLMChain chain...
Prompt after formatting:
Write a concise summary of the following:


"SCENE Denmark."


CONCISE SUMMARY:
Prompt after formatting:
Write a concise summary of the following:


"Act IV
Scene 1
A room in the castle.
Enter KING CLAUDIUS, QUEEN GERTRUDE, ROSENCRANTZ, and GUILDENSTERN
KING CLAUDIUS
There's matter in these sighs, these profound heaves:
You must translate: 'tis fit we understand them.
Where is your son?
QUEEN GERTRUDE
Bestow this place on us a little while.
Exeunt ROSENCRANTZ and GUILDENSTERN
Ah, my good lord, what have I seen to-night!
KING CLAUDIUS
What, Gertrude? How does Hamlet?
QUEEN GERTRUDE
Mad as the sea and wind, when both contend
Which is the mightier: in his lawless fit,
Behind the arras hearing something stir,
Whips out his rapier, cries, 'A rat, a rat!'
And, in this brainish apprehension, kills
The unseen good old man.
KING CLAUDIUS
O heavy deed!
It had been so with us, had we been there:
His liberty is full of threats to all;
To you yourself, to us, to every one.
Alas, how shall this bloody deed be answer'd?
It will be laid to us, whose providence
Should have kept short, restrain'd and out of haunt,"


CONCISE SUMMARY:
Prompt after formatting:
Write a concise summary of the following:


"You from the Polack wars, and you from England,
Are here arrived give order that these bodies
High on a stage be placed to the view;
And let me speak to the yet unknowing world
How these things came about: so shall you hear
Of carnal, bloody, and unnatural acts,
Of accidental judgments, casual slaughters,
Of deaths put on by cunning and forced cause,
And, in this upshot, purposes mistook
Fall'n on the inventors' reads: all this can I
Truly deliver.
PRINCE FORTINBRAS
Let us haste to hear it,
And call the noblest to the audience.
For me, with sorrow I embrace my fortune:
I have some rights of memory in this kingdom,
Which now to claim my vantage doth invite me.
HORATIO
HAMLET - Act V
141"


CONCISE SUMMARY:
Prompt after formatting:
Write a concise summary of the following:


"there. Be the players ready?
ROSENCRANTZ
Ay, my lord; they stay upon your patience.
QUEEN GERTRUDE
Come hither, my dear Hamlet, sit by me.
HAMLET
No, good mother, here's metal more attractive.
LORD POLONIUS
To KING CLAUDIUS
O, ho! do you mark that?
HAMLET - Act III
70"


CONCISE SUMMARY:

> Finished chain.


> Entering new StuffDocumentsChain chain...


> Entering new LLMChain chain...
Prompt after formatting:
Write a concise summary of the following:


"
This scene takes place in Denmark.


King Claudius and Queen Gertrude question Rosencrantz and Guildenstern about Hamlet's whereabouts. Once they're alone, Queen Gertrude reveals that Hamlet is mad and killed an unseen old man in his fit of rage. King Claudius is horrified by the news, and worries that the deed will be laid to them for not restraining Hamlet.

 Prince Fortinbras orders that the bodies of those killed in the Polack and English wars be put on display for the world to see, so that he can explain how these things happened and how the inventors' purposes were mistaken. He invites the noblest to the audience to hear the story, and claims his rights of memory in the kingdom.


Hamlet and the other characters discuss if the players are ready. Gertrude invites Hamlet to sit with her, but he declines, saying there is something more attractive. Lord Polonius remarks on this."


CONCISE SUMMARY:

> Finished chain.

> Finished chain.

> Finished chain.

Out[96]:

" King Claudius and Queen Gertrude question Rosencrantz and Guildenstern about Hamlet's whereabouts, and Queen Gertrude reveals that Hamlet is mad and killed an old man. Prince Fortinbras orders the bodies of those killed in the Polack and English wars to be put on display and invites the noblest to the audience. Hamlet and the other characters discuss if the players are ready, and Gertrude invites him to sit with her but he declines."

You can see from the trace, summarize_chain would start a number of chains to perform the sub-tasks, MapReduceDocumentsChain and StuffDocumentsChain. For each chain, it created subsequent LLMChain, which performs the "text in, text out" operation to finish the sub-task before return the result to the parent chain.

This is nice, and we can add more to it. In the following example, you will see how a number of chains are being connected by SimpleSequentialChain.

In [88]:

from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

llm = OpenAI(temperature = 0.7, openai_api_key = openai_api_key)

Firstly, we will define a summarize_chain, it looks slightly different from the previous definition, because we will be taking input from SimpleSequentialChain with a custom prompt, and output result synopsis to be consumed by the next chain.

In [89]:

template = """Write a concise summary of the following:

{text}

SUMMARY:"""
prompt = PromptTemplate(template = template, input_variables = ["text"])
summarize_chain = load_summarize_chain(llm, chain_type = "stuff", prompt = prompt, output_key = "synopsis")

Secondly, we create a simple LLMChain that will carry out the task of creating a review from the synopsis generated by the summarize_chain.

In [90]:

template = """You are a play critic from the New York Times. Given the synopsis of play, it is your job to write a review for that play.

Play Synopsis:
{synopsis}
Review from a New York Times play critic of the above play:"""
prompt = PromptTemplate(input_variables = ["synopsis"], template = template)
review_chain = LLMChain(llm = llm, prompt = prompt, output_key = "review")

Lastly, we provide both summarize_chain and review_chain to SimpleSequentialChain that will delegate work to each chain respectively.

In [91]:

from langchain.chains import SimpleSequentialChain

overall_chain = SimpleSequentialChain(chains = [summarize_chain, review_chain], verbose = True)
overall_chain.run(relevant_documents)


> Entering new SimpleSequentialChain chain...
 King Claudius, Queen Gertrude, Rosencrantz and Guildenstern are in a room in the castle. Queen Gertrude expresses her shock at the recent events and wonders how Hamlet is, to which King Claudius replies. He expresses his fear of the consequences of Hamlet's actions and decides to tell the world what happened to avoid blame. Prince Fortinbras arrives and expresses his claim to the kingdom. They all prepare for the play, and Hamlet refuses to sit by his mother. Lord Polonius wonders what Hamlet means by his words.


The latest production of "Hamlet" is a captivating, gripping experience that will leave theatergoers enthralled throughout its entirety. Refreshingly original and thought-provoking, this play is a must-see for anyone with an appreciation for classic theater. 

This adaptation of the classic Shakespearean tragedy managed to breathe new life into the centuries-old work and bring it to a modern audience. The actors were able to deliver a powerful performance that was affecting and raw, as they navigated the complicated emotions and relationships between the characters. 

The set was simple yet effective, creating an atmosphere of tension and drama that was crucial to the story. The costumes were also tasteful and well-chosen, evoking the time period of the play without being too elaborate. 

Ultimately, this production of "Hamlet" is an absolute success. It is sure to leave a lasting impact on its viewers and will remain an unforgettable experience for years to come.

> Finished chain.

Out[91]:

'\n\nThe latest production of "Hamlet" is a captivating, gripping experience that will leave theatergoers enthralled throughout its entirety. Refreshingly original and thought-provoking, this play is a must-see for anyone with an appreciation for classic theater. \n\nThis adaptation of the classic Shakespearean tragedy managed to breathe new life into the centuries-old work and bring it to a modern audience. The actors were able to deliver a powerful performance that was affecting and raw, as they navigated the complicated emotions and relationships between the characters. \n\nThe set was simple yet effective, creating an atmosphere of tension and drama that was crucial to the story. The costumes were also tasteful and well-chosen, evoking the time period of the play without being too elaborate. \n\nUltimately, this production of "Hamlet" is an absolute success. It is sure to leave a lasting impact on its viewers and will remain an unforgettable experience for years to come.'

If you ignore the "creativity" in the review, it's a really cool application of LLMs. By using chain, we can actually provide a short term or even long term memory to LLMs.

Memory¶

In essence, memory keeps track of state information between user input and LLMs output. Memory works in many ways. The simplest way is to create a ConversationBufferMemory and add messages to it. Alternatively, it's easier to use ConversationBufferMemory in combination with chain.

In [112]:

from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

llm = OpenAI(temperature = 0.7, openai_api_key = openai_api_key)
memory = ConversationBufferMemory(memory_key = "chat_history", return_messages = True)
qa = ConversationalRetrievalChain.from_llm(llm, vector_store.as_retriever(), memory = memory)
qa.run("Who is Ophelia?")

Out[112]:

" Ophelia is a character in William Shakespeare's play Hamlet. She is the daughter of Polonius and the potential wife of Prince Hamlet."

In [113]:

qa.run("How has she died?")

Out[113]:

' Ophelia drowned in a river.'

From this example, you can see LLMs can use memory to infer the previous context to give more targeted answer. Let's also take a look at what's in memory as well.

In [109]:

memory.load_memory_variables({})

Out[109]:

{'chat_history': [HumanMessage(content='Who is Ophelia?', additional_kwargs={}, example=False),
  AIMessage(content=" Ophelia is a character in William Shakespeare's play Hamlet. She is a young noblewoman of Denmark, the daughter of Polonius, sister of Laertes, and potential wife of Prince Hamlet.", additional_kwargs={}, example=False),
  HumanMessage(content='How has she died?', additional_kwargs={}, example=False),
  AIMessage(content=' Ophelia died by drowning.', additional_kwargs={}, example=False)]}

Agent¶

Agent is one the most powerful concept in LangChian, it basically frees users up from explicitly specifying the sequence of actions to complete a task, but delegating the decision making process to agents using a set of tools in their toolkit. Tools are specific programs or scripts that can be used to perform a specific task, such as google search, making web requests. Toolkit, on the other hand, is a collection of tools designed to be used together.

On top of that, Agent Executor is at the centre of planning and executing. It is responsible for orchestrating the agent and the tools to determine which tools to call and in what order.

Let's take a look at how these concepts glue together with an example.

In [130]:

from langchain import OpenAI, VectorDBQA
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

llm = OpenAI(temperature = 0, openai_api_key = openai_api_key)

We create a vector store that stores the play Hamlet.

In [131]:

from langchain.document_loaders import PyMuPDFLoader

loader = PyMuPDFLoader("hamlet.pdf")
doc = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 200,
)
docs = text_splitter.split_documents(doc)
embeddings = OpenAIEmbeddings(openai_api_key = openai_api_key)
hamlet = Chroma.from_documents(docs, embeddings, collection_name = "hamlet")

Using embedded DuckDB without persistence: data will be transient

Similarly, we create a vector store to store the play King Lear.

In [137]:

loader = PyMuPDFLoader("king_lear.pdf")
doc = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 200,
)
docs = text_splitter.split_documents(doc)
embeddings = OpenAIEmbeddings(openai_api_key = openai_api_key)
king_lear = Chroma.from_documents(docs, embeddings, collection_name = "king_lear")

Using embedded DuckDB without persistence: data will be transient

Now, we will load both plays into agent executor, where the decision of which book to look up for relevant questions is made.

In [139]:

from langchain.agents.agent_toolkits import VectorStoreInfo, VectorStoreRouterToolkit, create_vectorstore_router_agent

hamlet_vectorstore_info = VectorStoreInfo(
    name = "hamlet",
    description = "Information about Hamlet",
    vectorstore = hamlet
)
king_lear_vectorstore_info = VectorStoreInfo(
    name = "king lear",
    description = "Information about King Lear",
    vectorstore = king_lear
)
toolkit = VectorStoreRouterToolkit(
    vectorstores = [hamlet_vectorstore_info, king_lear_vectorstore_info],
    llm = llm
)
agent_executor = create_vectorstore_router_agent(
    llm = llm,
    toolkit = toolkit,
    verbose = True
)

The question gives a clear hint that the agent executor should look into Hamlet for answers.

In [140]:

agent_executor.run("Who is Prince Hamlet?")


> Entering new AgentExecutor chain...
 I need to find information about Hamlet
Action: hamlet
Action Input: Who is Prince Hamlet?
Observation:  Prince Hamlet is the protagonist of William Shakespeare's play The Tragedy of Hamlet, Prince of Denmark. He is the son of the late King Hamlet and the nephew of the current King Claudius.
Thought: I now know the final answer
Final Answer: Prince Hamlet is the protagonist of William Shakespeare's play The Tragedy of Hamlet, Prince of Denmark. He is the son of the late King Hamlet and the nephew of the current King Claudius.

> Finished chain.

Out[140]:

"Prince Hamlet is the protagonist of William Shakespeare's play The Tragedy of Hamlet, Prince of Denmark. He is the son of the late King Hamlet and the nephew of the current King Claudius."

The following question is more subtle, Edmund is a character in King Lear. Therefore, the agent executor will need to determine which book to use and provide an answer.

In [141]:

agent_executor.run("Who is Edmund?")


> Entering new AgentExecutor chain...
 I need to find out who Edmund is.
Action: king lear
Action Input: Who is Edmund?
Observation:  Edmund is a character in the play King Lear by William Shakespeare. He is the illegitimate son of the Earl of Gloucester.
Thought: I now know the final answer
Final Answer: Edmund is a character in the play King Lear by William Shakespeare. He is the illegitimate son of the Earl of Gloucester.

> Finished chain.

Out[141]:

'Edmund is a character in the play King Lear by William Shakespeare. He is the illegitimate son of the Earl of Gloucester.'

There are also many other types of agents and use cases for agent executor.

Final Thoughts¶

Congratulations! We have reached the end of this exercise. To recap, the tutorial introduces the basic components that make up LangChain. We have experimented schema, models and prompts, and studies how to make sure of indexes, chains and memory to build functioning products that solve context issues with LLMs and extend their capabilities. Finally, we arrived at agents where we explored semi-automated workloads and I hope that inspires you to build!

If you are keen to learn more, go straight to Use Cases where I will walk you through the use cases of LangChain with detailed examples to showcase its amazing capabilities.

Checkout other learnings and resources I shared in my GitHub.

If you'd like to share your questions and feedback, or keen to get involved in building, you can reach out to me directly on Twitter and LinkedIn!

Meng Lin's Byte-Wise Words

LangChain Tutorial: Deep Dive into the Basic Components of LangChain

Schema¶

Models¶

Prompts¶

Indexes¶

Chains¶

Memory¶

Agent¶

Final Thoughts¶

Comments

Post a Comment

Popular posts from this blog

How to: Add Watermark to PDFs Programmatically using iTextSharp

A practical guide to Scala Traits

A Short Guide to AWK