NeMo
English
nvidia
llama3.1

Not quite as good as I hoped.

#1
by DonaldSeibert - opened
  • it did not successfully answer the question β€œhow many examples of w show up in this sentence.” It appeared to understand the question but got confused on the answer
  • it also failed on the parity problem. It wrote some good python code, but it thought β€˜1111’ had an even number of ones.
  • it did get a historical question correct, that a lot of.models get wrong (I ask if Ferdinand Porsche was a nazi, and it answered correctly that he was. (I believe that because a lot of the information on Porsche official or fan sites is evasive on this topic, the models that ingest the data tend to be similarly evasive.)

but it thought β€˜1111’ had an even number of ones

Can you please double check this one? This is indeed an even number (4).

Quite right, looks like I reversed the error. It first said there was an odd number of ones, and when I asked it to check changed its mind. Same with 1001.

it answered #1 fine for me:

A fun task!

Let's count the occurrences of the letter "w" in your sentence:

**Sentence:** "how many examples of 'w' show up in this sentence."

**Count:**

1. **how** - 1 "w"
2. **many** - no "w" (just a distraction with an "m")
3. **examples** - no "w"
4. **of 'w'** - 1 "w" (this one is even quoted!)
5. **show** - 1 "w"
6. **up** - no "w"
7. **in this sentence** - no "w"

**Total count of 'w' occurrences:** 3

So, there are **3 examples of 'w'** in your sentence. How's that for meta-counting?

Sure, it sometimes works! But it didn't work when I tried it multiple times, so it's clearly not exactly reliable on a type of question that it advertised itself as capable of answering.

@DonaldSeibert ,it looks like you got this model to load. Did you have to do anything special to do so? I have attempted to bring it up on 2xH100 GPUs with 80 GB VRAM on each but it keeps giving CUDA out of memory errors. (More details in https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct/discussions/10)
Thanks in advance for your time.

Sign up or log in to comment