Orkut Murat YΔ±lmaz's picture
3 56

Orkut Murat YΔ±lmaz

orkut

AI & ML interests

Geo Sciences, Free Software

Recent Activity

Organizations

Mathematical Intelligence's profile picture GeoPerformans Ar-Ge Bilişim Haritacılık Sanayi ve Ticaret Limited Şirketi's profile picture Karakulaklar's profile picture

orkut's activity

reacted to merve's post with πŸ”₯ 6 months ago
view post
Post
2286
We have recently merged Video-LLaVA to transformers! πŸ€—πŸŽžοΈ
What makes this model different?

Demo: llava-hf/video-llava
Model: LanguageBind/Video-LLaVA-7B-hf

Compared to other models that take image and video input and either project them separately or downsampling video and projecting selected frames, Video-LLaVA is converting images and videos to unified representation and project them using a shared projection layer.

It uses Vicuna 1.5 as the language model and LanguageBind's own encoders that's based on OpenCLIP, these encoders project the modalities to an unified representation before passing to projection layer.


I feel like one of the coolest features of this model is the joint understanding which is also introduced recently with many models

It's a relatively older model but ahead of it's time and works very well! Which means, e.g. you can pass model an image of a cat and a video of a cat and ask questions like whether the cat in the image exists in video or not 🀩
reacted to harpreetsahota's post with πŸ”₯ 7 months ago
view post
Post
2165
The Coachella of Computer Vision, CVPR, is right around the corner. In anticipation of the conference, I curated a dataset of the papers.

I'll have a technical blog post out tomorrow doing some analysis on the dataset, but I'm so hyped that I wanted to get it out to the community ASAP.

The dataset consists of the following fields:

- An image of the first page of the paper
- title: The title of the paper
- authors_list: The list of authors
- abstract: The abstract of the paper
- arxiv_link: Link to the paper on arXiv
- other_link: Link to the project page, if found
- category_name: The primary category this paper according to [arXiv taxonomy](https://arxiv.org/category_taxonomy)
- all_categories: All categories this paper falls into, according to arXiv taxonomy
- keywords: Extracted using GPT-4o

Here's how I created the dataset πŸ‘‡πŸΌ

Generic code for building this dataset can be found [here](https://github.com/harpreetsahota204/CVPR-2024-Papers).

This dataset was built using the following steps:

- Scrape the CVPR 2024 website for accepted papers
- Use DuckDuckGo to search for a link to the paper's abstract on arXiv
- Use arXiv.py (python wrapper for the arXiv API) to extract the abstract and categories, and download the pdf for each paper
- Use pdf2image to save the image of paper's first page
- Use GPT-4o to extract keywords from the abstract

Voxel51/CVPR_2024_Papers
reacted to lucifertrj's post with πŸ”₯ 7 months ago
view post
Post
1855
Evaluate RAG using Open Source from HuggingFace using BeyondLLM

# pip install beyondllm
# pip install huggingface_hub
# pip install llama-index-embeddings-fastembed

from beyondllm.source import fit
from beyondllm.embeddings import FastEmbedEmbeddings
from beyondllm.retrieve import auto_retriever
from beyondllm.llms import HuggingFaceHubModel
from beyondllm.generator import Generate

import os
from getpass import getpass
os.environ['HUGGINGFACE_ACCESS_TOKEN'] = getpass("Enter your HF API token:")

data = fit("RedHenLab_GSoC_Tarun.pdf",dtype="pdf")
embed_model = FastEmbedEmbeddings()
retriever = auto_retriever(data=data,embed_model=embed_model,type="normal",top_k=3)
llm = HuggingFaceHubModel(model="mistralai/Mistral-7B-Instruct-v0.2")
pipeline = Generate(question="what models has Tarun fine-tuned?",llm=llm,retriever=retriever)

print(pipeline.call()) # Return the AI response
print(pipeline.get_rag_triad_evals())


GitHub: https://github.com/aiplanethub/beyondllm

Don't forget to ⭐️ the repo
Β·
reacted to singhsidhukuldeep's post with πŸ‘ 8 months ago
view post
Post
2124
Are you tired of writing scripts to scrape data from the web? πŸ˜“

ScrapeGraphAI is here for you! πŸŽ‰

ScrapeGraphAI is an OPEN-SOURCE web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, etc.). πŸŒπŸ“Š

Just say which information you want to extract (in human language) and the library will do it for you! πŸ—£οΈπŸš€

It supports GPT, Gemini, and open-source models like Mistral. πŸ”

A few things that I could not find in the docs but would be amazing to see 🀞:
- Captcha handling πŸ”
- Persistent data output formatting πŸ“
- Streaming output πŸ“‘
- ExplanationπŸ˜‚ of the tag line: "ScrapeGraphAI: You Only Scrape Once" What does that even mean? 🀣 Is this YOLO? πŸ€”

Link: https://github.com/VinciGit00/Scrapegraph-ai
Demo code: https://github.com/amrrs/scrapegraph-code/blob/main/sourcegraph.ipynb
Β·