PyTorch
llama
smurty commited on
Commit
76d38ec
·
verified ·
1 Parent(s): 77fe6c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -17
README.md CHANGED
@@ -16,8 +16,6 @@ LLama8b-NNetNav-WA is a [LLama-3.1-8B](https://huggingface.co/meta-llama/Llama-3
16
  Most details about this model along with details can be found in our paper: [NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild](https://arxiv.org/abs/2410.02907).
17
 
18
 
19
- ![show an example trajectory from NNetNav-WA](TODO)
20
-
21
  ## Table of Contents
22
 
23
  - [Model Card for Llama8b-NNetNav-WA](#model-card-for--model_id-)
@@ -40,37 +38,57 @@ Most details about this model along with details can be found in our paper: [NNe
40
  - [Model Card Contact](#model-card-contact)
41
  - [How to Get Started with the Model](#how-to-get-started-with-the-model)
42
 
43
-
44
  ## Model Details
 
45
 
46
- ### Model Description
47
-
48
  <!-- Provide a longer summary of what this model is/does. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
 
50
 
51
- ## Uses
52
 
 
 
 
 
53
 
54
 
55
  ## Bias, Risks, and Limitations
56
-
57
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
58
 
59
  ## How to Get Started with the Model
60
 
61
- ```python
62
-
63
-
64
- ```
65
 
66
  ## Training Details
67
 
68
  ### Training Data
69
 
70
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
71
-
72
- This model was trained on the [NNetnav-WA](https://huggingface.co/datasets/stanfordnlp/nnetnav-wa) dataset. It can be used directly with the open-instruct library.
73
-
74
 
75
  ### Training Procedure
76
 
@@ -110,10 +128,8 @@ This model was fine-tuned with [Open-Instruct](https://github.com/allenai/open-i
110
  ## Model Card Authors [optional]
111
 
112
  <!-- This section provides another layer of transparency and accountability. Whose views is this model card representing? How many voices were included in its construction? Etc. -->
113
-
114
  Shikhar Murty
115
 
116
  ## Model Card Contact
117
 
118
  smurty@cs.stanford.edu
119
- shikhar.murty@gmail.com
 
16
  Most details about this model along with details can be found in our paper: [NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild](https://arxiv.org/abs/2410.02907).
17
 
18
 
 
 
19
  ## Table of Contents
20
 
21
  - [Model Card for Llama8b-NNetNav-WA](#model-card-for--model_id-)
 
38
  - [Model Card Contact](#model-card-contact)
39
  - [How to Get Started with the Model](#how-to-get-started-with-the-model)
40
 
 
41
  ## Model Details
42
+ This model is intended to be used as a **web-agent** i.e. given an instruction such as "Upvote the post by user smurty123 on subreddit r/LocalLLaMA", and a web-url "reddit.com", the model can perform the task by executing a sequence of actions.
43
 
44
+ ### Action Space
 
45
  <!-- Provide a longer summary of what this model is/does. -->
46
+ The action space of the model is as follows:
47
+ ```plaintext
48
+ Page Operation Actions:
49
+ `click [id]`: This action clicks on an element with a specific id on the webpage.
50
+ `type [id] [content] [press_enter_after=0|1]`: Use this to type the content into the field with id. By default, the "Enter" key is pressed after typing unless press_enter_after is set to 0.
51
+ `hover [id]`: Hover over an element with id.
52
+ `press [key_comb]`: Simulates the pressing of a key combination on the keyboard (e.g., Ctrl+v).
53
+ `scroll [down|up]`: Scroll the page up or down.
54
+
55
+ Tab Management Actions:
56
+ `new_tab`: Open a new, empty browser tab.
57
+ `tab_focus [tab_index]`: Switch the browser's focus to a specific tab using its index.
58
+ `close_tab`: Close the currently active tab.
59
+
60
+ URL Navigation Actions:
61
+ `goto [url]`: Navigate to a specific URL.
62
+ `go_back`: Navigate to the previously viewed page.
63
+ `go_forward`: Navigate to the next page (if a previous 'go_back' action was performed).
64
+
65
+ Completion Action:
66
+ `stop [answer]`: Issue this action when you believe the task is complete. If the objective is to find a text-based answer, provide the answer in the bracket. If you believe the task is impossible to complete, provide the answer as "N/A" in the bracket.
67
+ ```
68
 
69
+ ## Results on Benchmarks
70
 
71
+ This model gets the following results on WebArena and WebVoyager:
72
 
73
+ | Model | WebArena (SR) | WebVoyager (SR) |
74
+ |------------------------|--------------:|---------------:|
75
+ | **GPT-4** | **14.1** | **33.5** |
76
+ | **llama8b-nnetnav-wa** | **16.3** | **28.1** |
77
 
78
 
79
  ## Bias, Risks, and Limitations
 
80
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
81
+ TODO
82
 
83
  ## How to Get Started with the Model
84
 
85
+ TODO
 
 
 
86
 
87
  ## Training Details
88
 
89
  ### Training Data
90
 
91
+ This model was trained on the [NNetnav-WA](https://huggingface.co/datasets/stanfordnlp/nnetnav-wa) dataset, which is comprised of synthetic demonstrations entirely from self-hosted websites.
 
 
 
92
 
93
  ### Training Procedure
94
 
 
128
  ## Model Card Authors [optional]
129
 
130
  <!-- This section provides another layer of transparency and accountability. Whose views is this model card representing? How many voices were included in its construction? Etc. -->
 
131
  Shikhar Murty
132
 
133
  ## Model Card Contact
134
 
135
  smurty@cs.stanford.edu