OpenAssistant
/

oasst-sft-7-llama-30b-xor

Model card Files Files and versions Community

Could we run a XOR converted model using docker + huggingface/text-generation-inference?

by pevogam - opened May 4, 2023

May 4, 2023

I wanted to use a docker command inspired from here like

docker run --gpus "device=0" -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:sha-7a1ba58  --model-id OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5```
but there is no `OpenAssistant/oasst-sft-7-llama-30b` and we have to use the cached model instead. Is it possible to simply symlink the model into the data folder and use a tag that would map to the folder name?



@olivierdehaene
	 Have you tried anything like this?

kronosprime

May 4, 2023

I'm literally one step behind you at this very moment, was just reading the details of huggingface/text-generation-inference and thinking about what I needed to do to run it on the MPS device rather than CUDA.

I guess you saw the end of this page:
https://huggingface.co/spaces/huggingchat/chat-ui/blob/main/README.md where it talks about running local inference

pevogam

May 4, 2023

I think it is possible to do it in a simpler way without adapters and additional dependencies and I have managed to do so with existing pythia models just fine. Upon further inspection of the situation here, I try docker run --gpus "device=0" -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:sha-7a1ba58 --model-id OpenAssistant/oasst-sft-7-llama-30b and get

Repository Not Found for url: https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.

in particular I chose the name of the folder from another part of the same error message

OSError: OpenAssistant/oasst-sft-7-llama-30b is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

which told me that the text inference server would expect OpenAssistant/oasst-sft-7-llama-30b folder or symlink in its data directory, the latter extracted from

volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

in https://github.com/huggingface/text-generation-inference#docker.

Alas, this is still not enough and perhaps we need a different reference hash or more tweaks.

kronosprime

May 4, 2023

If anyone has an idea, I would love to hear it. But one issue I noticed is that oasst-sft-7-llama-30b and text-generation-inference require different versions of the transformers package. Most notably, the text-generation-inference requires the transformers library to have a section for 'bloom'. When I ran the inference with the version required by oasst the error was:

ModuleNotFoundError: No module named 'transformers.models.bloom.parallel_layers'

And here's the error log from where I installed text-generation-inference locally, with a virtual environment rather than docker. I built, installed it with 'make install', and ran the following using :
text-generation-launcher --model-id ~/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b

It was looking good, it found the model and converted it to safetensors, but then...

2023-05-04T04:44:17.786682Z  INFO text_generation_launcher: Args { model_id: "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b", revision: None, sharded: None, num_shard: None, quantize: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1000, max_total_tokens: 1512, max_batch_size: None, waiting_served_ratio: 1.2, max_batch_total_tokens: 32000, max_waiting_tokens: 20, port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, env: false }
2023-05-04T04:44:17.786799Z  INFO text_generation_launcher: Starting download process.
2023-05-04T04:44:19.031428Z  WARN download: text_generation_launcher: No safetensors weights found for model /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b at revision None. Converting PyTorch weights to safetensors.

2023-05-04T04:44:19.031609Z  INFO download: text_generation_launcher: Convert /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/pytorch_model-00007-of-00007.bin to /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/model-00007-of-00007.safetensors.

2023-05-04T04:44:19.031778Z  INFO download: text_generation_launcher: Convert /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/pytorch_model-00006-of-00007.bin to /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/model-00006-of-00007.safetensors.

2023-05-04T04:44:19.031871Z  INFO download: text_generation_launcher: Convert /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/pytorch_model-00001-of-00007.bin to /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/model-00001-of-00007.safetensors.

2023-05-04T04:44:19.031959Z  INFO download: text_generation_launcher: Convert /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/pytorch_model-00003-of-00007.bin to /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/model-00003-of-00007.safetensors.

2023-05-04T04:44:19.032117Z  INFO download: text_generation_launcher: Convert /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/pytorch_model-00004-of-00007.bin to /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/model-00004-of-00007.safetensors.

2023-05-04T04:45:14.801473Z  INFO download: text_generation_launcher: Convert /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/pytorch_model-00005-of-00007.bin to /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/model-00005-of-00007.safetensors.

2023-05-04T04:45:14.801596Z  INFO download: text_generation_launcher: Convert: [1/7] -- ETA: 0:05:30

2023-05-04T04:46:09.529107Z  INFO download: text_generation_launcher: Convert /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/pytorch_model-00002-of-00007.bin to /Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/oasst-sft-7-llama-30b/model-00002-of-00007.safetensors.

2023-05-04T04:46:09.529708Z  INFO download: text_generation_launcher: Convert: [2/7] -- ETA: 0:04:35

2023-05-04T04:46:09.706104Z  INFO download: text_generation_launcher: Convert: [3/7] -- ETA: 0:02:26.666668

2023-05-04T04:46:11.292773Z  INFO download: text_generation_launcher: Convert: [4/7] -- ETA: 0:01:24

2023-05-04T04:46:11.419401Z  INFO download: text_generation_launcher: Convert: [5/7] -- ETA: 0:00:44.800000

2023-05-04T04:46:11.762799Z  INFO download: text_generation_launcher: Convert: [6/7] -- ETA: 0:00:18.666667

2023-05-04T04:46:19.904766Z  INFO download: text_generation_launcher: Convert: [7/7] -- ETA: 0

2023-05-04T04:46:20.296744Z  INFO text_generation_launcher: Successfully downloaded weights.
2023-05-04T04:46:20.297360Z  INFO text_generation_launcher: Starting shard 0
2023-05-04T04:46:30.316717Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:46:40.352174Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:46:50.377670Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:47:00.402327Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:47:10.440521Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:47:20.505151Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:47:30.597273Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:47:40.630386Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:47:50.674882Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:48:00.677545Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:48:10.684952Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:48:20.794592Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:48:30.798020Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:48:40.848794Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:48:50.913512Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:49:00.971170Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:49:11.051961Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:49:21.139972Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:49:31.189893Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:49:41.243413Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:49:51.329588Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:50:01.396801Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-05-04T04:50:04.841977Z ERROR shard-manager: text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/Users/kronosprime/Workspace/LLM/text-generation-inference/server/text_generation_server/cli.py", line 58, in serve
    server.serve(model_id, revision, sharded, quantize, uds_path)
  File "/Users/kronosprime/Workspace/LLM/text-generation-inference/server/text_generation_server/server.py", line 155, in serve
    asyncio.run(serve_inner(model_id, revision, sharded, quantize))
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 629, in run_until_complete
    self.run_forever()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 596, in run_forever
    self._run_once()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 1890, in _run_once
    handle._run()
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/Users/kronosprime/Workspace/LLM/text-generation-inference/server/text_generation_server/server.py", line 124, in serve_inner
    model = get_model(model_id, revision, sharded, quantize)
  File "/Users/kronosprime/Workspace/LLM/text-generation-inference/server/text_generation_server/models/__init__.py", line 137, in get_model
    return llama_cls(model_id, revision, quantize=quantize)
  File "/Users/kronosprime/Workspace/LLM/text-generation-inference/server/text_generation_server/models/causal_lm.py", line 479, in __init__
    super(CausalLM, self).__init__(
  File "/Users/kronosprime/Workspace/LLM/text-generation-inference/server/text_generation_server/models/model.py", line 26, in __init__
    self.all_special_ids = set(tokenizer.all_special_ids)
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/transformers-4.29.0.dev0-py3.9.egg/transformers/tokenization_utils_base.py", line 1299, in all_special_ids
    all_ids = self.convert_tokens_to_ids(all_toks)
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/transformers-4.29.0.dev0-py3.9.egg/transformers/tokenization_utils_fast.py", line 254, in convert_tokens_to_ids
    ids.append(self._convert_token_to_id_with_added_voc(token))
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/transformers-4.29.0.dev0-py3.9.egg/transformers/tokenization_utils_fast.py", line 260, in _convert_token_to_id_with_added_voc
    return self.unk_token_id
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/transformers-4.29.0.dev0-py3.9.egg/transformers/tokenization_utils_base.py", line 1142, in unk_token_id
    return self.convert_tokens_to_ids(self.unk_token)
  File "/Users/kronosprime/Workspace/oasst-sft-7-llama-30b-xor/xor_venv/lib/python3.9/site-packages/transformers-4.29.0.dev0-py3.9.egg/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment