The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Using Abstractive Document Summarization  _  Google Cloud

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by guanjie gan, 2022-06-01 03:30:43

Using Abstractive Document Summarization  _  Google Cloud

Using Abstractive Document Summarization  _  Google Cloud

warning Confidential material:
This page is confidential. Do not share or discuss until authorized to do so.

Using Abstractive Document Summarization

Alpha
This product is covered by the Pre-GA Offerings Terms (/terms/service-terms#1)
of the Google Cloud Terms of Service. Pre-GA products might have limited
support,
and changes to pre-GA products might not be compatible with other pre-GA versions.
For more information, see the
launch stage descriptions
 (/products#product-launch-stages).

Create a Google Cloud project and enable the API.
Call the API, passing it an image.
Receive response

1. Create a Google Cloud Project and enable the API

1. In the Google Cloud Console, on the project selector page,
select or create a Google Cloud project
 (/resource-manager/docs/creating-managing-projects).

star Note: If you don't plan to keep the
resources that you create in this procedure, create a project instead of
selecting an existing project. After you finish
these steps, you can
delete the project, removing all resources associated with the project.

Go to project selector (https://console.cloud.google.com/projectselector2/home/dashboard)

2. Make sure that billing is enabled for your Cloud project. Learn how to
check if billing is enabled on a project
 (/billing/docs/how-to/verify-billing-enabled).

3. Enable the Cloud AI Workshop API.

Enable the API (https://console.cloud.google.com/flows/enableapi?apiid=aiworkshop.googleapis.com)

4. Create a service account:
a. In the Cloud Console, go to the Create service account page.

Go to Create service account (https://console.cloud.google.com/projectselector/iam-admin/serviceaccounts/create?supportedpurview=project

b. Select your project.
c. In the Service account name field, enter a name. The Cloud Console fills
in the Service account ID field based on this name.

In the Service account description field, enter a description. For example,
Service account for quickstart.

d. Click Create and continue.
e. To provide access to your project, grant the following role(s) to your
service account: Project > Owner.

In the Select a role list, select a role.

For additional roles, click add Add another
role and add each additional role.

star Note: The Role field affects which resources your service account can access in your
project. You can revoke these roles or grant additional roles
later. In production environments,
do not grant the Owner, Editor, or Viewer roles. Instead, grant a
predefined role
 (/iam/docs/understanding-roles#predefined_roles) or
custom role (/iam/docs/understanding-custom-roles) that meets your needs.

f. Click Continue.
g. Click Done to finish creating the service account.

Do not close your browser window. You will use it in the next step.
5. Create a service account key:

a. In the Cloud Console, click the email address for the service account that you
created.
b. Click Keys.
c. Click Add key, then click Create new key.
d. Click Create. A JSON key file is downloaded to your computer.
e. Click Close.

6. Set the environment variable GOOGLE_APPLICATION_CREDENTIALS
to the path of the JSON file that contains your service account

key.
This variable only applies to your current shell session, so if you open
a new session, set the variable again.

add_circle Example: Linux or macOS

export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH edit"

Replace KEY_PATH with the path of the JSON file that contains your service account key.

For example:

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"

add_circle Example: Windows

For PowerShell:

$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH edit"
Replace KEY_PATH with the path of the JSON file that contains your service account key.

For example:

$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\service-account-file.json"

For command prompt:

set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH edit
Replace KEY_PATH with the path of the JSON file that contains your service account key.

7. Install (/sdk/docs/install) and initialize (/sdk/docs/initializing) the Google Cloud CLI.

2. Prepare text

Extract the raw text you want to summarize from your document(s). Make sure to include the most important information in the first ~400
words. Also, please preserve newlines that may be present in the document.

Example: summarizing the first 450 words of the news article from
https://venturebeat.com/2019/12/23/google-brains-ai-achieves-state-
of-the-art-text-summarization-performance

 (https://venturebeat.com/2019/12/23/google-brains-ai-achieves-state-of-the-art-text-summarization-performance)

Raw text input:

Summarizing text is a task at which machine learning algorithms are improving, as evidenced by a recent paper published by Microsoft.
That's good news — automatic summarization systems promise to cut down on the amount of message-reading enterprise workers do,
which one survey estimates amounts to 2.6 hours each day.

Not to be outdone, a Google Brain and Imperial College London team built a system — Pre-training with Extracted Gap-sentences for
Abstractive Summarization Sequence-to-sequence, or Pegasus — that leverages Google's Transformers architecture combined with
pretraining objectives tailored for abstractive text generation. They say it achieves state-of-the-art results in 12 summarization tasks
spanning news, science, stories, instructions, emails, patents, and legislative bills, and that it shows "surprising" performance on low-
resource summarization, surpassing previous top results on six data sets with only 1,000 examples.

As the researchers point out, text summarization aims to generate accurate and concise summaries from input documents, in contrast to
executive techniques. Rather than merely copy fragments from the input, abstractive summarization might produce novel words or cover
principal information such that the output remains linguistically fluent.

Transformers are a type of neural architecture introduced in a paper by researchers at Google Brain, Google's AI research division. As do
all deep neural networks, they contain functions (neurons) arranged in interconnected layers that transmit signals from input data and
slowly adjust the synaptic strength (weights) of each connection — that's how all AI models extract features and learn to make
predictions. But Transformers uniquely have attention. Every output element is connected to every input element, and the weightings
between them are calculated dynamically.

The team devised a training task in which whole, and putatively important, sentences within documents were masked. The AI had to fill in
the gaps by drawing on web and news articles, including those contained within a new corpus (HugeNews) the researchers compiled.

In experiments, the team selected their best-performing Pegasus model — one with 568 million parameters, or variables learned from
historical data — trained on either 750GB of text extracted from 350 million web pages (Common Crawl) or on HugeNews, which spans
1.5 billion articles totaling 3.8TB collected from news and news-like websites. (The researchers say that in the case of HugeNews, a
allowlist of domains ranging from high-quality news publishers to lower-quality sites was used to seed a web-crawling tool.)

Pegasus achieved high linguistic quality in terms of fluency and coherence, according to the researchers, and it didn't require
countermeasures to mitigate disfluencies. Moreover, in a low-resource setting with just 100 example articles, it generated summaries at a
quality comparable to a model that had been trained on a full data set ranging from 20,000 to 200,000 articles.

Raw text summary:

Google researchers have developed an artificial intelligence (AI) system that produces high-quality summaries from low-resource data
sets.

3. Choose a model.

Choose a specific summarization model. Here are the ones supported:

single sentence news summarization: Single sentence summary of a news article.
multi bullet news summarization: Multiple bullets summary of a news article.
post and short story summarization: Summary of informal short stories or posts.
email subject summarization: Subject prediction of corporate emails.
how-to instructions summarization: Summary of instructions from the online WikiHow website.
dialogue summarization: Dialogue Summarization.
email content summarization: Email Content Summarization.

4. Use the service by altering the CLI examples below or the python examples in this Colab
notebook. (https://colab.sandbox.google.com/drive/1wtTIAbjetJqZ43_x7baX8KOLyep1Vjui)

Summarization Request - Generate natural language summaries for various types of text documents including news articles, scientific
publications, legal documents, emails, etc.



Request

curl -v \
-X POST \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
-H "Content-Type: application/json" \
-d '{ "summarization_request": { "document_text": "this is great!", "model_id": "single sentence news summarization"
https://aiworkshop.googleapis.com/v1experimental/projects/ai-workshop-tif/locations/us-central1/models/TIF32989848266

Response

{
"summarizationResponse": {
"summaryText": "This is one of my all-time favourites!"
}

}

All rights reserved. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-04-14 UTC.


Click to View FlipBook Version