|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- SRDdev/Youtube-Scripts |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
widget: |
|
- text: Introduction to Vertex AI Feature Store |
|
example_title: Example 1 |
|
- text: What are Kubeflow Components? |
|
exmaple_title: Example 2 |
|
tags: |
|
- Text-Generation |
|
--- |
|
|
|
# SCRIPTGPT |
|
|
|
Pretrained model on the English language using a causal language modeling (CLM) objective. It was introduced in |
|
[this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) |
|
and first released at [this page](https://openai.com/blog/better-language-models/). |
|
|
|
## Model description |
|
ScriptGPT is a language model trained on a dataset of 5,000 YouTube videos that explain artificial intelligence (AI) concepts. |
|
ScriptGPT is a Causal language transformer. The model resembles the GPT2 architecture, |
|
the model is a Causal Language model meaning it predicts the probability of a sequence of words based on the preceding words in the sequence. |
|
It generates a probability distribution over the next word given the previous words, without incorporating future words. |
|
|
|
The goal of ScriptGPT is to generate scripts for AI videos that are coherent, informative, and engaging. |
|
This can be useful for content creators who are looking for inspiration or who want to automate the process of generating video scripts. |
|
To use ScriptGPT, users can provide a prompt or a starting sentence, and the model will generate a sequence of words that follow the context and style of the training data. |
|
|
|
The current model is the smallest one with 124 million parameters (ScriptGPT) |
|
|
|
More models are coming soon... |
|
|
|
## Intended uses |
|
The intended uses of ScriptGPT include generating scripts for videos that explain artificial intelligence concepts, providing inspiration for content creators, and |
|
automating the process of generating video scripts. |
|
|
|
|
|
## How to use |
|
You can use this model directly with a pipeline for text generation. |
|
|
|
1. __Load Model__ |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("SRDdev/Script_GPT") |
|
model = AutoModelForCausalLM.from_pretrained("SRDdev/Script_GPT") |
|
``` |
|
|
|
2. __Pipeline__ |
|
```python |
|
from transformers import pipeline |
|
generator = pipeline('text-generation', model= model , tokenizer=tokenizer) |
|
|
|
context = "Introduction to Vertex AI Feature Store" |
|
length_to_generate = 200 |
|
|
|
script = generator(context, max_length=length_to_generate, do_sample=True)[0]['generated_text'] |
|
``` |
|
<p style="opacity: 0.8">Keeping the context more technical and related to AI will generate better outputs</p> |
|
|
|
## Limitations and bias |
|
> The model is trained on Youtube Scripts and will work better for that. It may also generate random information and users should be aware of that and cross-validate the results. |
|
|
|
The used is linked [here](https://www.kaggle.com/datasets/jfcaro/5000-transcripts-of-youtube-ai-related-videos) |
|
|
|
## Citations |
|
``` |
|
@model{ |
|
Name=Shreyas Dixit |
|
framework=Pytorch |
|
Year=Jan 2023 |
|
Pipeline=text-generation |
|
Github=https://github.com/SRDdev |
|
LinkedIn=https://www.linkedin.com/in/srddev |
|
} |
|
``` |