Update README.md
Browse files
README.md
CHANGED
@@ -16,8 +16,6 @@ LLama8b-NNetNav-WA is a [LLama-3.1-8B](https://huggingface.co/meta-llama/Llama-3
|
|
16 |
Most details about this model along with details can be found in our paper: [NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild](https://arxiv.org/abs/2410.02907).
|
17 |
|
18 |
|
19 |
-

|
20 |
-
|
21 |
## Table of Contents
|
22 |
|
23 |
- [Model Card for Llama8b-NNetNav-WA](#model-card-for--model_id-)
|
@@ -40,37 +38,57 @@ Most details about this model along with details can be found in our paper: [NNe
|
|
40 |
- [Model Card Contact](#model-card-contact)
|
41 |
- [How to Get Started with the Model](#how-to-get-started-with-the-model)
|
42 |
|
43 |
-
|
44 |
## Model Details
|
|
|
45 |
|
46 |
-
###
|
47 |
-
|
48 |
<!-- Provide a longer summary of what this model is/does. -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
|
|
|
50 |
|
51 |
-
|
52 |
|
|
|
|
|
|
|
|
|
53 |
|
54 |
|
55 |
## Bias, Risks, and Limitations
|
56 |
-
|
57 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
|
|
58 |
|
59 |
## How to Get Started with the Model
|
60 |
|
61 |
-
|
62 |
-
|
63 |
-
|
64 |
-
```
|
65 |
|
66 |
## Training Details
|
67 |
|
68 |
### Training Data
|
69 |
|
70 |
-
|
71 |
-
|
72 |
-
This model was trained on the [NNetnav-WA](https://huggingface.co/datasets/stanfordnlp/nnetnav-wa) dataset. It can be used directly with the open-instruct library.
|
73 |
-
|
74 |
|
75 |
### Training Procedure
|
76 |
|
@@ -110,10 +128,8 @@ This model was fine-tuned with [Open-Instruct](https://github.com/allenai/open-i
|
|
110 |
## Model Card Authors [optional]
|
111 |
|
112 |
<!-- This section provides another layer of transparency and accountability. Whose views is this model card representing? How many voices were included in its construction? Etc. -->
|
113 |
-
|
114 |
Shikhar Murty
|
115 |
|
116 |
## Model Card Contact
|
117 |
|
118 |
smurty@cs.stanford.edu
|
119 |
-
shikhar.murty@gmail.com
|
|
|
16 |
Most details about this model along with details can be found in our paper: [NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild](https://arxiv.org/abs/2410.02907).
|
17 |
|
18 |
|
|
|
|
|
19 |
## Table of Contents
|
20 |
|
21 |
- [Model Card for Llama8b-NNetNav-WA](#model-card-for--model_id-)
|
|
|
38 |
- [Model Card Contact](#model-card-contact)
|
39 |
- [How to Get Started with the Model](#how-to-get-started-with-the-model)
|
40 |
|
|
|
41 |
## Model Details
|
42 |
+
This model is intended to be used as a **web-agent** i.e. given an instruction such as "Upvote the post by user smurty123 on subreddit r/LocalLLaMA", and a web-url "reddit.com", the model can perform the task by executing a sequence of actions.
|
43 |
|
44 |
+
### Action Space
|
|
|
45 |
<!-- Provide a longer summary of what this model is/does. -->
|
46 |
+
The action space of the model is as follows:
|
47 |
+
```plaintext
|
48 |
+
Page Operation Actions:
|
49 |
+
`click [id]`: This action clicks on an element with a specific id on the webpage.
|
50 |
+
`type [id] [content] [press_enter_after=0|1]`: Use this to type the content into the field with id. By default, the "Enter" key is pressed after typing unless press_enter_after is set to 0.
|
51 |
+
`hover [id]`: Hover over an element with id.
|
52 |
+
`press [key_comb]`: Simulates the pressing of a key combination on the keyboard (e.g., Ctrl+v).
|
53 |
+
`scroll [down|up]`: Scroll the page up or down.
|
54 |
+
|
55 |
+
Tab Management Actions:
|
56 |
+
`new_tab`: Open a new, empty browser tab.
|
57 |
+
`tab_focus [tab_index]`: Switch the browser's focus to a specific tab using its index.
|
58 |
+
`close_tab`: Close the currently active tab.
|
59 |
+
|
60 |
+
URL Navigation Actions:
|
61 |
+
`goto [url]`: Navigate to a specific URL.
|
62 |
+
`go_back`: Navigate to the previously viewed page.
|
63 |
+
`go_forward`: Navigate to the next page (if a previous 'go_back' action was performed).
|
64 |
+
|
65 |
+
Completion Action:
|
66 |
+
`stop [answer]`: Issue this action when you believe the task is complete. If the objective is to find a text-based answer, provide the answer in the bracket. If you believe the task is impossible to complete, provide the answer as "N/A" in the bracket.
|
67 |
+
```
|
68 |
|
69 |
+
## Results on Benchmarks
|
70 |
|
71 |
+
This model gets the following results on WebArena and WebVoyager:
|
72 |
|
73 |
+
| Model | WebArena (SR) | WebVoyager (SR) |
|
74 |
+
|------------------------|--------------:|---------------:|
|
75 |
+
| **GPT-4** | **14.1** | **33.5** |
|
76 |
+
| **llama8b-nnetnav-wa** | **16.3** | **28.1** |
|
77 |
|
78 |
|
79 |
## Bias, Risks, and Limitations
|
|
|
80 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
81 |
+
TODO
|
82 |
|
83 |
## How to Get Started with the Model
|
84 |
|
85 |
+
TODO
|
|
|
|
|
|
|
86 |
|
87 |
## Training Details
|
88 |
|
89 |
### Training Data
|
90 |
|
91 |
+
This model was trained on the [NNetnav-WA](https://huggingface.co/datasets/stanfordnlp/nnetnav-wa) dataset, which is comprised of synthetic demonstrations entirely from self-hosted websites.
|
|
|
|
|
|
|
92 |
|
93 |
### Training Procedure
|
94 |
|
|
|
128 |
## Model Card Authors [optional]
|
129 |
|
130 |
<!-- This section provides another layer of transparency and accountability. Whose views is this model card representing? How many voices were included in its construction? Etc. -->
|
|
|
131 |
Shikhar Murty
|
132 |
|
133 |
## Model Card Contact
|
134 |
|
135 |
smurty@cs.stanford.edu
|
|