import streamlit as st
from fns import *

st.set_page_config(
    page_title="Synthesist",
    page_icon="👋",
)

# st.write("# Welcome to Pathfinder! 👋")
st.image('local_files/synth_logo.png')

st.sidebar.success("Select a function above.")
st.sidebar.markdown("Current functions include visualizing papers in the arxiv embedding, searching for similar papers to an input paper or prompt phrase, or answering quick questions.")


st.markdown("")
st.markdown(
    """
    **Synthesist** (from Peter Watt's [Blindsight](https://scalar.usc.edu/works/network-ecologies/on-peter-watts-blindsight)) is a framework for searching and visualizing papers on the [arXiv](https://arxiv.org/) using the context
    sensitivity from modern large language models (LLMs) to better parse patterns in paper contexts.
    
    This tool was built during the [JSALT workshop](https://www.clsp.jhu.edu/2024-jelinek-summer-workshop-on-speech-and-language-technology/) to do awesome things.

    **👈 Select a tool from the sidebar** to see some examples
    of what this framework can do!

    ### Tool summary:
    - Please wait while the initial data loads and compiles, this takes about a minute initially.
    - `Paper search` looks for relevant papers given an arxiv id or a question.

    This is not meant to be a replacement to existing tools like the
    [ADS](https://ui.adsabs.harvard.edu/),
    [arxivsorter](https://www.arxivsorter.org/), semantic search or google scholar, but rather a supplement to find papers
    that otherwise might be missed during a literature survey.
    It is trained on astro-ph (astrophysics of galaxies) papers up to last-year-ish mined from arxiv and supplemented with ADS metadata,
    if you are interested in extending it please reach out!
    
    
    Also add: more pages, actual generation, diff. toggles for retrieval/gen, feedback form, socials, literature, contact us, copyright, collaboration, etc.

    The image below shows a representation of all the astro-ph.GA papers that can be explored in more detail
    using the `Arxiv embedding` page. The papers tend to cluster together by similarity, and result in an
    atlas that shows well studied (forests) and currently uncharted areas (water).
    """
)


s = time.time()
st.markdown(f'Loading data for retrieval system, please wait before jumping to one of the pages....')
st.session_state.retrieval_system = EmbeddingRetrievalSystem()
st.session_state.dataset = load_dataset('arxiv_corpus/', split = "train")
st.markdown(f'Loaded retrieval system, time taken: %.1f sec' %(time.time()-s))