Complete Overhaul (without conflict)
- increased context length
- increased people use time (Prev is 120 sec decreased to 20 means 100 sec extra for users)
- changed theme
- added 4k model
- Made it one place for all phi 3 medium models.
Thank you π«
If the duration is at 20, doesnt that mean they only have 20 seconds of gpu time?
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch.
If the duration is at 20, doesnt that mean they only have 20 seconds of gpu time?
Duration is minimum GPU time required and also Maximum generation time.
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch.
Each generation is getting this.
I have saved your code, and updated it with old one until fixed.
If you dont have it, tell me, I saved it.
I have saved your code, and updated it with old one until fixed.
If you dont have it, tell me, I saved it.
I duplicated it, and solving the error and also making it faster using flashattention2
I have saved your code, and updated it with old one until fixed.
If you dont have it, tell me, I saved it.
I duplicated it, and solving the error and also making it faster using flashattention2
Alright, when its ready you can open a new pr and ill merge :)
Amazing.