Kuldeep Singh Sidhu's picture
6 3

Kuldeep Singh Sidhu

singhsidhukuldeep

AI & ML interests

๐Ÿ˜ƒ TOP 3 on HuggingFace for posts ๐Ÿค— Seeking contributors for a completely open-source ๐Ÿš€ Data Science platform! singhsidhukuldeep.github.io

Organizations

Posts 85

view post
Post
117
Exciting breakthrough in LLM reasoning: Introducing "Thread of Thought" (ThoT) - a novel prompting strategy that revolutionizes how language models handle chaotic contexts!

Unlike traditional approaches that struggle with complex, interleaved information, ThoT enables LLMs to methodically segment and analyze extended contexts with remarkable precision. Here's how it works:

Technical Deep Dive:
- ThoT employs a two-step prompting mechanism:
1. Initial Analysis: Uses a template combining chaotic context (X) and query (Q) with a trigger sentence that initiates systematic reasoning.
2. Conclusion Refinement: Leverages the organized thought sequence to extract definitive answers.

Implementation Details:
- Seamlessly integrates as a "plug-and-play" module with existing LLMs.
- Requires no model retraining or fine-tuning.
- Works with various prompting techniques and model architectures.

Performance Highlights:
- Outperformed traditional methods on PopQA and EntityQ datasets.
- Achieved 57.4% accuracy with GPT-3.5-turbo (vs. 48.2% for Chain-of-Thought).
- Demonstrated superior performance across model scales, from 7B to 70B parameters.

Key Applications:
- Retrieval-augmented generation.
- Multi-turn conversation responses.
- Complex reasoning tasks requiring information synthesis.

What makes it special: ThoT mirrors human cognitive processes by breaking down complex information into manageable segments while maintaining logical continuity โ€“ a game-changer for handling information-dense contexts.
view post
Post
2010
Good folks at @nvidia and @Tsinghua_Uni have released LLAMA-MESH - A Revolutionary Approach to 3D Content Generation!

This innovative framework enables the direct generation of 3D meshes from natural language prompts while maintaining strong language capabilities.

Here is the Architecture & Implementation!

>> Core Components

Model Foundation
- If you haven't guessed it yet, it's built on the LLaMA-3.1-8B-Instruct base model
- Maintains original language capabilities while adding 3D generation
- Context length is set to 8,000 tokens

3D Representation Strategy
- Uses the OBJ file format for mesh representation
- Quantizes vertex coordinates into 64 discrete bins per axis
- Sorts vertices by z-y-x coordinates, from lowest to highest
- Sorts faces by the lowest vertex indices for consistency

Data Processing Pipeline
- Filters meshes to a maximum of 500 faces for computational efficiency
- Applies random rotations (0ยฐ, 90ยฐ, 180ยฐ, 270ยฐ) for data augmentation
- Generates ~125k mesh variations from 31k base meshes
- Uses Cap3D-generated captions for text descriptions

>> Training Framework

Dataset Composition
- 40% Mesh Generation tasks
- 20% Mesh Understanding tasks
- 40% General Conversation (UltraChat dataset)
- 8x training turns for generation, 4x for understanding

Training Configuration
- Deployed on 32 A100 GPUs (for Nvidia, this is literally in-house)
- 21,000 training iterations
- Global batch size: 128
- AdamW optimizer with a 1e-5 learning rate
- 30-step warmup with cosine scheduling
- Total training time: approximately 3 days (based on the paper)

This research opens exciting possibilities for intuitive 3D content creation through natural language interaction. The future of digital design is conversational!

models

None public yet

datasets

None public yet