Practical LLM 2 - Summarize a PDF file

In the second of Practical LLM series (See Part 1), we will discuss how to use a Large Language Model to summarize a PDF document. In the previous article, we discussed different layers needed to build a production ready LLM based application. In a simplified version of our app, we will only use following three layers to demonstrate this functionality.

Gen AI Model : Like in our previous application, we will be using Anthropic Claude Base Model available with Bedrock as our LLM AI Model. It won’t be too different to use a different base model or service. We may have to slightly modify how we construct the prompt and access the API exposed by the AI Service.
Backend Service: We will follow the same pattern as in the first application and use Lambda function as the backend service. The service will extract the content from the PDF file and construct appropriate prompt to send to the AI Service. It will then relay the text response received from the AI service.
Frontend App: Front end application will allow user to upload the PDF file to the service and display the summary text response from the service.

Putting It All Together

To put everything together, we will create a basic working application with the interaction between different layers as depicted below. PDF File Sequence Diagram

Our front end code is quite simple. It simply allows uploading the file and POST it to the service. Relevant code is shown below:

    <form className="mt-6" onSubmit={onSubmit}>
      <div class="mb-6">
        <input
          type="file"
          name="file"
          onChange={(e) => setFile(e.target.files?.[0])}
        />
      </div>
    </form>

  const onSubmit = async (e: React.FormEvent<HTMLFormElement>) => {
    e.preventDefault()
    if (!file) return
    setIsLoading(true);
    try {
      const data = new FormData()
      data.set('file', file)

      const res = await fetch(serviceUrl, {
        method: 'POST',
        body: data
      })
      // handle the error
      if (!res.ok) throw new Error(await res.text())
      setIsLoading(false);
      const resJ = await res.json();
      setSummary(resJ.completion);
    } catch (e: any) {
      // Handle errors here
      console.error(e)
      setIsLoading(false);
    }
  }

Full source code for UI is available here. https://github.com/sekharkafle/pdfui

The backend code parses the uploaded file and extracts the text from the pdf. The text is then augmented with a specific prompt which is sent as input to the Gen AI Base Model. The model then responds with the text response that the service relays to the client. Source code to do this is given below:

const parser = require('lambda-multipart-parser')  
const pdf = require('pdf-parse')
const { BedrockRuntimeClient, InvokeModelCommand } = require("@aws-sdk/client-bedrock-runtime");
exports.handler = async function(event, context) {
  const result = await parser.parse(event);
  let resText = '';
  if(result.files?.length){
    const d = await pdf(result.files[0].content);
    let txt = d.text;
    var prompt = `Human:You are an expert assistant with expertise in summarizing and pulling 
                  out important sections of a text. The following text is from a PDF document. 
                  Follow these steps: read the text, summarize the text, and identify the main ideas.
                  In your response include the summary and bullet points for the main ideas. 
                  Do not respond with more than 5 sentences.\n<TEXT>${txt}</TEXT>\n\nAssistant:`;
    const client = new BedrockRuntimeClient({
        serviceId: 'bedrock',
        region: 'us-east-1',
        });
    const input = {
        modelId: 'anthropic.claude-v2',
        contentType: "application/json",
        accept: "application/json",
        body: JSON.stringify({
        prompt: prompt,
        max_tokens_to_sample: 2000,
        temperature: 0.5,
        top_k: 250,
        top_p: 1,
        stop_sequences: ['\n\nHuman:'],
        anthropic_version: 'bedrock-2023-05-31'
        }),
    };
    const command = new InvokeModelCommand(input);
    const response = await client.send(command);

    // The response is a Uint8Array of a stringified JSON blob
    // so you need to first decode the Uint8Array to a string
    // then parse the string.
    let res2 = new TextDecoder().decode(response.body);
    return JSON.parse(res2);   
  }
  return 'error';
}

Full source code for Lambda Service is available here. https://github.com/sekharkafle/pdflambda

With both back end and front end code ready to go, we are all set to test the web app:

PDF UI

Prompt Engineering

Prompts are inputs into a generative AI model. Sending better prompts to the model will generate better results. There are various techniques to enhance prompts (See https://cloud.google.com/blog/products/application-development/five-best-practices-for-prompt-engineering). One such technique is to make prompt very specific for the desired outcome.

To demonstrate this, lets start with this content from AWS as our input document.

PDF Content

In our Lambda code, we have defined the prompt with instruction to summarize the PDF content with 5 sentences max in the code snippet given below:

var prompt = `Human:You are an expert assistant with expertise in summarizing and pulling 
                  out important sections of a text. The following text is from a PDF document. 
                  Follow these steps: read the text, summarize the text, and identify the main ideas.
                  In your response include the summary and bullet points for the main ideas. 
                  Do not respond with more than 5 sentences.\n<TEXT>${txt}</TEXT>\n\nAssistant:`;

An output from the AI model to the above prompt is shown below:

 Here is a 5 sentence summary of the key points:

Amazon Bedrock is a managed service that provides access to foundation models
 for building AI applications. It allows users to experiment with top models, 
 customize them with personal data via fine-tuning, and integrate them into 
 apps using AWS tools. Key features are experimenting with models via an API 
 or console, creating knowledge bases to augment model responses, and secure 
 deployment without managing infrastructure. Amazon Bedrock is serverless, 
 allows quick startup, and enables private customization of models. It is 
 available in certain AWS regions with tiered pricing models.

The main ideas are:
- Access to foundation models 
- Customization via fine-tuning
- Integration with AWS services
- Serverless and quick start
- Private customization

Now, lets play with this prompt a bit and change it as below to generate one sentence summary.

Input Prompt:

Human:You are an expert assistant with expertise in summarizing and pulling
out important sections of a text. The following text is from a PDF document. 
Respond with 1 sentence summary of the text.\n<TEXT>${pdfContent}</TEXT>\n\nAssistant:

Model output:

Amazon Bedrock is a managed service that provides access to foundation models for
 building AI applications with capabilities like fine-tuning models, integrating 
 with enterprise systems, and responsible AI.

We have sucessfully demonstrated prompt engineering to get text summary as desired from the AI model.

Happy Summarizing!!!