Edit model card

SentenceTransformer based on nomic-ai/nomic-embed-text-v1

This is a sentence-transformers model finetuned from nomic-ai/nomic-embed-text-v1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/nomic-embed-text-v1
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NomicBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("m7n/nomic-embed-philosophy-triplets_v1")
# Run inference
sentences = [
    'of aU governments, including democracy, as follows: If moraUty presupposes individual autonomy, and every government denies that auto nomy, then no government can be moraUy justified, (p. 1) An autonomous person, Cohen points out, foUowing Kant, is one who is self-legislating, "Acting out of respect for a rule which he imposes upon Studies in Soviet Thought 25 (1983) 219. 220 REVIEWS himself, not out of fear or habit" (pp. 1?2). How, then, can an individual be self-legislating on the one hand, but obligated to obey the commands of the state on the other? Cohen finds the answer to this apparent d?emma in the notion of partici pation : In a democracy but only in a democracy each citizen has a right to a voice in the lawmaking process. Enjoyment of this right commits the citizen to respect the laws \'resulting from that process. The agreement to participate is not contingent upon getting one\'s own way. Each citizen knows, even before learning what issues w?l arise, that no one w?l always get his way. But be?eving the legislative process fair, each person is com mitted in advance to observe the rules that are its outcome. To that system, full consent is given. Governments derive their just powers from the consent of the governed, (pp. 3-4) Cohen explains that the obligation to obey the law, wh?e a "prima facie and very powerful" one, is not absolute, (p. 5) A citizen is not required to do the bidding of the state as though he were its slave. The \'promise\' which a citizen makes to obey the law, as',
    'any other promise, may be broken moraUy in "truly exceptional circumstances" where a duty stronger than the duty to obey the law arises.3 Having argued that democracy is the only moraUy viable poUtical system, Professor Cohen moves next to explore whether democracy thrives better in a sociaUst or capitaUst economic setting. Cohen asserts that sociaUst democracy is bu?t on two main principles: (1) pubUc ownership of the means of production and distribution; and (2) planning production and distribution for the common good. (p. 58) SociaUsm is a logical extension of democracy, he notes, bringing the popular w?l to matters of production and wealth. SociaUsm involves "the democratic control of aU resources in the community by society as a whole", (p. 41) Socialism meshes weU with human nature, Cohen arugues, for human beings have "a deep and natural inclination to help (one another) ..." (p. 64) He Usts a number of practical advantages of socialism over capitaUsm. SociaUsm, he claims, is less subject to, if not immune from, inherent flaws in capitaUsm, including extremes in wealth, cycles of boom and bust, unemploy ment, wastes of competition associated with costly advertisement and packag ing, and the subordination of workers. Further, socialism has an impressive track record: "Many socialist countries, mamtaining five and seven year economic plans under continual adjustment, have met with phenomenal success", (p. 59) Even capitaUst countries impUcitly affirm the viab?ity of REVIEWS 221 sociaUsm',
    "efficiency, the design of the market can reflect other social values, as we shall see. A version of the market which is likely to appeal to socialists is the following: the means of production are owned by the state but leased to groups of workers in such a way that each worker gets productive resources of roughly equal value. Each cooperative decides on the nature and volume of its production, and sells its goods on the market. The profits are distributed among the members of each cooperative according to mutually agreed rules, though we may suppose that profits above a certain point are heavily taxed by the state, partly to accumulate resources for future generations, partly to finance an extensive welfare state which provides for essential needs without charge. For cooperatives which are unable to make a profit, there is a social security system which supplements their members' incomes until they find a more profitable line of production or move elsewhere. The private hiring of labour is, however, made illegal in the same way as slavery is today. This brief sketch of a socialist market system needs to be filled out in various ways. There are difficulties in setting down the terms on [476] POLITICAL THEORY / NOVEMBER 1977 which capital would be leased by the state to the workers' cooperatives; for instance, in establishing how far cooperatives should be allowed to accumulate capital for their own expansion. The rules of association for each cooperative need to be specified, particularly those governing the entry and exit",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.958
dot_accuracy 0.042
manhattan_accuracy 0.956
euclidean_accuracy 0.958
max_accuracy 0.958

Triplet

Metric Value
cosine_accuracy 0.975
dot_accuracy 0.025
manhattan_accuracy 0.9725
euclidean_accuracy 0.975
max_accuracy 0.975

Training Details

Training Dataset

Unnamed Dataset

  • Size: 10,000 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 15 tokens
    • mean: 253.76 tokens
    • max: 573 tokens
    • min: 14 tokens
    • mean: 252.8 tokens
    • max: 680 tokens
    • min: 13 tokens
    • mean: 273.24 tokens
    • max: 574 tokens
  • Samples:
    anchor positive negative
    JOURNAL OF PHILOSOPHY have. Success in learning "how to know economically, liberally, effectively," is the measure of success in civilization; and the clarifying of this success which is the task of philosophy makes philosophy an effective instrument of advance. But "a problem of knowledge in general [of knowledge of 'the ontological'] is, to speak brutally, nonsense. " 5 Since the plain conjunctive experience which is the very definition of what is "radical" in James's empiricism has its being, according to his most explicit statement, in passing thought, there is most certainly another exceedingly vigorous motif in James's thought repeatedly declared by him to be his chief philosophic hope, which he had wished ardently to bring to adequate expression before he died. If a biological or situational behaviorism really runs counter to this motif, yet moves more vitally with his spirit, James himself in his dearest philosophic expectation was following a will-of-the-wisp. Now it may be the case that a majority of those who knew and loved James believe today that this is the case -both those with him in the empiricist camp and those against him in the rationalistic. No doubt most empiricists will welcome the name "radical" while repudiating James's often repeated definition so that no meaning for the term remains to them except either an implied boast or merely the profession of an ideal. The writer has long held the view that the conjunctive experience in James 's writings and the biological behaviorism (which Dewey shows as present if not worked out in James's mind) seemingly so much at cross purposes rightly belong together and mutually support and fulfill each other. The purpose of this paper is to show how this is so. Dr. Lowe's study goes, if quietly, yet unhesitatingly, to the support of James's radical empiricism. He sums his argument up in recommending "a decision about his [James's] doctrine [of the conjuLnctive experience] as all but necessary preliminaries to the evaluation of Whitehead." 6 It is pointed out that the fulcrum of Whitehead's philosophy is his doctrine of the transmission of feelings. Sympathetic study of his philosophy depends upon initial conviction upon that point, precisely the doctrine that William James propounded with great vigor for twenty-five years. For later philosophers it is primarily directed upon the immediate temporal relation of "felt transition" displayed in "the plain conjunctive experience." The role of Whitehead's theory of prehensions is to develop this doctrine along general lines. 5 Experience and Nature, 1925, p. 21. 6 Ibid., p. 125. 7 Ibid., pp. 174 if. COMMENTS AND CRITICISM 99 In this doctrine the present moment is presented as an atom or "drop" of experience which has taken up the immediately past moment and holds it immanent in itself by a felt transition of next-to-next, which is itself a component contributing to the present drop of experience. The atomic structure of experienee, basis of pluralism, together with the felt transition from drop to drop is the central point. A "drop" or atom of not only the specifically discriminated happenings "out there" but also the whole undiscriminated remainder.15 Our moments of experience and their associated durations succeed each other, forming a series of stratifications of nature. The successive 8 Alfred North Whitehead, An Enquiry Concerning the Principlea of Natural Knowledge, Cambridge: University Press, 1919, hereafter referred to as PNK; and The Principle of Relativity with applications to Physical Science, Cambridge: University Press, 1922, hereafter referred to as PRel. 9PNK, art. 16.1, 3.6. 10 PNK, art. 16.3-16.4; CN, pp. 106-107. 11 PNK, art. 20.2; CN, pp. 107, 187-188. 12 PNK, art. 16.5. 13 PNK, art. 14, 16.4-16.5, 18.1-18.2; CN, p. 52. 14 PNK, art. 14.1; CN, p. 51. 15 PNK, art. 19.4; CN, pp. 49-53, 186-187; PRel, pp. 25-26. APPEARANCE AND CAUSALITY IN WHITEHEAD'S EARLY WRITINGS 45 moments of experience bind the percipient events together into the locus of a directly experienced unity of awareness, with its memory of the past and anticipation of the future. The succession of associated durations or cross-sections of external nature exhibits a persisting, uniform structure which (through the operation of 'extensive abstraction') yields the uniform space-time continuum of geometry. Within this all-encompassing structure, particular happenings take place.16 The task of common sense and of physical science is to discover the particular factors that govern the directly perceived particular happenings. Two additional features of immediate experience are, as it were, given with the concrete data of sense-awareness as primordial attributions or assumptions.17 First, in sense-awareness, no clear demarcations between happenings can
    wrongdoers receive the punishment they deserve. A deserved punishment is one that is proportionate to the offender's culpability. Culpability has two components: (1) the severity of the wrong, and (2) the offender's blameworthiness. The broader aim of this article is to outline an alternative retributivist model that directly involves the victim in the determination of the appropriate and just punishment. The narrower aim is to show that the methodology employed by Michael Moore (1997) in support of the standard retributive model in fact better supports this alternative model. Moore himself explicitly rejects the idea that victims can play a role in determining just punishments, because this some role in producing it. According to retributive theory, punishment is justified as a way of restoring the just status quo ante that was disrupted by the offender.6 How punishment performs this task remains a matter of some controversy among retributive theorists. Though the view cannot be defended here, a plausible interpretation of how the state performs this task says that legal punishment involves state efforts to restore the equality of condition that, at least in those respects designated by basic moral rights, all citizens are entitled to. All citizens are entitled to have their lives, bodies, psychological integrity, and justly held property respected and defended by the law. In these respects, at least, the state should act to ensure their equality. Whether it should act in other ways to ensure equality among citizens is, of course, a matter of considerable controversy, though this is not a controversy the resolution of which may have significant implications for the core areas of the criminal law. 32 PUBLIC AFFAIRS QUARTERLY Criminal offenders act in (legally prohibited) ways that deprive victims of some or all of the equality of condition victims are entitled to. There are, it must be admitted, various ways in which the state might attempt to restore the requisite equality of condition. But with most serious crimes it can arguably be shown that the imposition of penal losses is the only appropriate equalizing response by the state. In particular, where victims cannot be made whole again by offender compensation or restitution, the state
    modal logic (ML), suggesting that we are dealing with deeply divergent accounts of our modal talk. However, CT captures but one version of the relevant semantic intuition, and does so on the basis of metaphysical assumptions (all worlds are equally real, individuals are world-bound) that are ostensibly discretionary. Just as ML can be translated into a language that quantifies explicitly over worlds, CT may be formulated as a semantic theory in which world quantification is purely metalinguistic. And just as Kripke-style semantics is formally compatible with the doctrine of world-boundedness, a counterpart-based semantics may in principle allow for cases of trans-world identity. In fact, one may welcome a framework that is general enough to include both Lewis's counterpart-based account and Kripke's identity-based account as distinguished special cases. There are several ways of doing so. The purpose of this paper is regular and normal modal logics K, D(a Aß)z>'3ß and universal instantiation (UI) which a formal semantics should validate if it is to be a contender for a semantics of our natural intensional languages. I show that counterpart theory does not validate these principles. Counterpart Theory The basis of the logical system for counterpart theory involves the introduction of primitive predicates and postulates to the lower predicate calculus. Lewis (1968; 113) uses the following primitive predicates,1 (1) Wx * is a possible world (2) xly X is in possible world y (3) Ax jc is actual (4) xCy y is a counterpart of x 1 My notation varies slightly from that of Lewis (1968). In particular, note that for the counterpart predicate Lewis understands Cxy to mean that x is a counterpart of y. COUNTERPART THEORY AS A SEMANTICS FOR MODAL LOGIC 257 Lewis's postulates encapsulate the principles of the semantics of counterpart theory. Most especially we note that nothing is in two worlds. We also note that anything in a world is a counterpart of itself. Thus the counterpart relation is reflexive. The following discussion meets the requirement that nothing is a counterpart of anything else in its own world although it need not presuppose this postulate. Lewis (1986; 214) remarks that while the postulate that nothing is a counterpart of anything else in its own world is a feature of some counterpart relations, such a restriction on the counterpart relation constitutes giving up some of the built-in flexibility of counterpart theory. Counterpart theory also involves the extension of
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 0.05
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 500 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 10 tokens
    • mean: 237.7 tokens
    • max: 497 tokens
    • min: 12 tokens
    • mean: 238.5 tokens
    • max: 485 tokens
    • min: 23 tokens
    • mean: 260.95 tokens
    • max: 499 tokens
  • Samples:
    anchor positive negative
    quantified modal logic, including the famous Barcan Formula. The paper appeared in the Journal of Symbolic Logic , followed shortly afterwards by two more papers published under the name Ruth C. Barcan. Alonzo Church, editor of the Journal, eventually insisted that she publish using her official name, and so a 1950 paper in the same Journal appeared already as authored by Ruth Barcan Marcus. Had Prof. Church's naming criteria been followed throughout, we would now be discussing the famous Marcus Formula. After she received her PhD, Ruth Barcan Marcus and her husband moved to Illinois, where he had accepted a position at Northwestern University. She spent an academic year as a post-doctoral fellow at the University of Chicago where Rudolf Carnap, whose paper 'Modalities and Quantification' had appeared also in 1946, and whose seminal Meaning and Necessity was published in 1947 [Carnap (1946), (1947)], was also working on quantified modal logic.2 After that year she held a series of post-doctoral, temporary or visiting positions and taught at Roosevelt University from 1959 to 1963. In 1964 she became head of the philosophy department of the University of Illinois, Chicago Circle and taught also at Northwestern. In 1973 she moved to Yale. She retired in 1992 but continued to be actively involved in philosophy dividing the time between her position as a senior research scholar at Yale and a distinguished visiting professorship at the University of California, Irvine. II. Modality and Modal Logic Ruth Barcan Marcus' 1946 paper presents the first system of modal logic that combines modal operators and quantifiers. A question that had arisen regarding any such system is whether theorems of the non-modal predicate calculus such as (1) Vx (Px -> Qx) -> (Vx Px -> Vx Qx) On Modality and Reference. Ruth Barcan Marcus (1921-2012) 205 would also be theorems, were the conditional uniformly interpreted as C.I. Lewis' strict conditional.3 As it turns out, the strict conditional version of (1): (2) Vx (Px => Qx) => (Vx Px => Vx Qx) is not derivable in the system that results from adding quantifiers to Lewis' S2. However, (2) is derivable if we can count on this formula: (3) O 3x Px => 3x OPx. (3) is precisely the Barcan Formula. It was introduced in the system of the 1946 paper as Axiom 11, which allows the derivation of (2) theorem 19, p. 5. Nowadays the Barcan Formula is stated as a material conditional and introduced, in some systems, as an axiom4 (BF) O 3x Px -> 3x OPx. BF says that if it is possible that something be P, then there is something that is possibly P. An equivalent version of the Barcan Formula states that if everything is necessarily P, then it is necessary that everything be P: (BF') Vx DPx ^ □ Vx Px The converse of the Barcan Formula: (CBF) 3x OPx -> O 3x Px, equivalent to (CBF') □ Vx Px ^ Vx DPx is already derivable in the system without the addition of any of the Royal Flemish Academy of Belgium for Science and the Arts in the latter part of 2010. 3. Bayart 195811 The soundness of first and second-order S5 modal logic I. Semantic definitions 0. To formulate a semantic theory of modal logic it is not sufficient to define for example, the necessary as that which is true in every model and 10 A generalisation of Bayart's completeness proof to the system T appeared in Cresswell (1967) and later in Hughes and Cresswell (1968). A more recent proof method for systems with the Barcan Formula is found in Thomason (1970). 11 Translation by M.J. Cresswell of 'La correction de la logique modale du premier et second ordre S5', Logique et Analyse 1, 1958, pp. 28-44. In this version I have corrected obvious typos. Some of these are indicated in the website version in square brackets [..]. I have changed Bayart's notation in the present version as explained in the introduction or commentary or in footnotes. (All footnotes are my comments on the translation.) ARNOULD BAYART'S MODAL COMPLETENESS THEOREMS 95 the possible as that which is true in some model. These definitions would do no more than introduce the notions of 'necessary' and 'possible' in the metalanguage. A semantics of modal logic demands that we assume an object language containing modal symbols and that we define under what conditions to attribute the values 'true' or 'false' to the formulae of this object language. One can
    rebut four objections to the claim that attributions of intentional attitudes are normative judgments, all stemming, directly or indirectly, from the widespread assumption the article by sketching the picture of normative thought that results. Though I defend a particular theory of normative speech elsewhere, the core insights of this article can be used by other theorists as well. The arguments offered
    picture are co-dependent, and their concerted action makes an intuitive pre -89 Beata Stawarska sentation of something absent in the (present) picture possible. Per ceptual apprehension gives picture consciousness its intuitive charac ter, while non-perceptual apprehension "fantasizes" the absent entity into the physical thing and turns it into a picture. Picture conscious ness involves therefore three interrelated elements: the picture-thing (Bildding), i.e. the physical thing (a piece of canvas, of paper, of stone) which serves as the material of the picture; the picture-object (.Bildobjekt), i.e. the picture apprehended not simply as a perceptual object but as a representation of a referent—the so-called picture subject (Bildsujet), i.e. an absent thing or person.6 This triple thing/object/subject structure is clearly at work in the apprehension of photographs or paintings. Can it also be discerned in fantasy, which no longer supports itself on physical or "external" things? Images, unlike physical pictures, are not independent from the consciousness that apprehends them. Instead, they are contents of consciousness, forming an integral and internal part of an imaginary experience. Unlike the physical picture which persists as a piece of canvas, paper, or stone, even after it ceases to function as a pictorial representation of an absent being, the internal picture does not "sur vive" the end of the fantasy episode—there is nothing left of it once the subject ceases to fantasize. The question arises: can such an immaterial "picture" serve the function of representing an absent pic ture-subject? Can an evanescent no-thing stand as a symbol of another thing?7 Material content seems indispensable if the picture is to fulfil its representational function: only as a perceptual thing can a picture yield an intuitive apprehension of an absent referent. In order to function as a representation (Bildobjet), the picture must be a thing (Bildding). In other words, there must be a physical support if the picture is to symbolize the absent picture-subject (Bildsujet). Yet such physical support is wanting in the case of "internal" pic tures. In fantasy, it is impossible to distinguish a picture-thing from the picture-subject it represents, and so it is difficult to see how the internal picture can serve the symbolic function at all. As a result, one can hardly sustain the interpretation of fantasy as the conscious ness of non-physical pictures and preserve a uniform theory of imag ination as picture consciousness.8 The aforementioned difficulties led Husserl to reformulate the theory of imagination, no longer taking the apprehension of a pic ture but the internal structure of consciousness in memory as a clue. The way in which memory presents something non-given or the way in which the absent past manifests itself in the present is of direct rel evance to Husserl's later conception of imaginary activity. In recol -90 Sartre on Imagination lection, an object appears in the present as belonging to the past, it is apprehended in the now and yet remains separated by temporal dis tance. Should it be concluded that one apprehends an image (or an "internal picture") of an He found, for instance, that as we can "see" a mountain which is not present by interpreting the paint marks on a canvas which are present, so in external perception we are perceiving a tree which is not immanent in consciousness by interpreting the sensations (Empfindungen) which are im manent. It would seem, therefore, that Husserl assumed at that time that any intuitive grasp of a real particular was either an inner perception of what is immanently present, or else a case where something immanently pres ent serves as a basis for an interpretation. To his astonishment, Husserl discovered after 1905, in his analyses of inner time-consciousness, that re membering could not be understood as an act of interpretation; that, for instance, remembering a past sound-sensation could not be described as an interpretation of a presently immanent sound-sensation, but had to be de scribed as a "direct" intuitive intending of a real particular which no longer existed.7 This exploded the myth of the unproblematic nature of inner per ception, because any perception whatsover necessarily involves some reten tion of the immediate past. In other words, a restriction of the domain of descriptive psychology or phenomenology to actually immanent real par ticulars proved to be absolutely impossible, since such a restriction would 66 GUIDO K?NG veto not only the use of external perception but that of inner perception as well! The only way out of all these difficulties was to officially admit all intentional objects, i.e., the intentional correlates of all mental acts, into the domain of
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.COSINE",
        "triplet_margin": 0.05
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • learning_rate: 1e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss all-nli-test_max_accuracy nomic_max_accuracy
0 0 - - - 0.94
0.04 100 0.0106 0.0093 - 0.944
0.08 200 0.009 0.0083 - 0.944
0.12 300 0.0068 0.0073 - 0.952
0.16 400 0.0066 0.0067 - 0.96
0.2 500 0.0069 0.0061 - 0.956
0.24 600 0.0056 0.0053 - 0.966
0.28 700 0.0042 0.0050 - 0.962
0.32 800 0.0059 0.0046 - 0.962
0.36 900 0.0051 0.0048 - 0.964
0.4 1000 0.0034 0.0046 - 0.964
0.44 1100 0.0054 0.0051 - 0.962
0.48 1200 0.0034 0.0047 - 0.964
0.52 1300 0.0042 0.0049 - 0.966
0.56 1400 0.0035 0.0041 - 0.968
0.6 1500 0.0043 0.0041 - 0.972
0.64 1600 0.0029 0.0045 - 0.964
0.68 1700 0.005 0.0044 - 0.97
0.72 1800 0.0036 0.0041 - 0.968
0.76 1900 0.0031 0.0040 - 0.976
0.8 2000 0.0037 0.0041 - 0.966
0.84 2100 0.0041 0.0037 - 0.97
0.88 2200 0.0044 0.0040 - 0.966
0.92 2300 0.0038 0.0046 - 0.966
0.96 2400 0.0043 0.0050 - 0.954
1.0 2500 0.0031 0.0049 - 0.96
1.04 2600 0.0046 0.0048 - 0.964
1.08 2700 0.0017 0.0045 - 0.96
1.12 2800 0.0015 0.0047 - 0.958
1.16 2900 0.0015 0.0046 - 0.966
1.2 3000 0.0011 0.0042 - 0.966
1.24 3100 0.0009 0.0041 - 0.962
1.28 3200 0.0006 0.0040 - 0.972
1.32 3300 0.0006 0.0041 - 0.966
1.3600 3400 0.0005 0.0046 - 0.958
1.4 3500 0.0007 0.0048 - 0.964
1.44 3600 0.0004 0.0046 - 0.966
1.48 3700 0.0008 0.0048 - 0.96
1.52 3800 0.0006 0.0047 - 0.966
1.56 3900 0.0002 0.0048 - 0.958
1.6 4000 0.0004 0.0047 - 0.964
1.6400 4100 0.0004 0.0047 - 0.966
1.6800 4200 0.0003 0.0048 - 0.96
1.72 4300 0.0001 0.0049 - 0.96
1.76 4400 0.0004 0.0050 - 0.956
1.8 4500 0.0007 0.0048 - 0.96
1.8400 4600 0.0006 0.0044 - 0.96
1.88 4700 0.0001 0.0044 - 0.962
1.92 4800 0.0005 0.0043 - 0.964
1.96 4900 0.0004 0.0043 - 0.966
2.0 5000 0.0004 0.0044 - 0.958
2.04 5100 0.0002 0.0045 - 0.956
2.08 5200 0.0002 0.0044 - 0.958
2.12 5300 0.0001 0.0043 - 0.96
2.16 5400 0.0005 0.0048 - 0.96
2.2 5500 0.0003 0.0049 - 0.958
2.24 5600 0.0004 - 0.975 -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification}, 
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
4
Safetensors
Model size
137M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for m7n/nomic-embed-philosophy-triplets_v1

Finetuned
(6)
this model

Evaluation results