metadata

language: en
license: apache-2.0
library_name: span-marker
tags:
  - span-marker
  - token-classification
  - ner
  - named-entity-recognition
  - generated_from_span_marker_trainer
datasets:
  - midas/inspec
metrics:
  - precision
  - recall
  - f1
widget:
  - text: >-
      Genetic algorithm guided selection : variable selection and subset
      selection A novel genetic algorithm guided selection method , GAS , has
      been described . The method utilizes a simple encoding scheme which can
      represent both compounds and variables used to construct a QSAR/QSPR model
      . A genetic algorithm is then utilized to simultaneously optimize the
      encoded variables that include both descriptors and compound subsets . The
      GAS method generates multiple models each applying to a subset of the
      compounds . Typically the subsets represent clusters with different
      chemotypes . Also a procedure based on molecular similarity is presented
      to determine which model should be applied to a given test set compound .
      The variable selection method implemented in GAS has been tested and
      compared using the Selwood data set -LRB- n = 31 compounds ; nu = 53
      descriptors -RRB- . The results showed that the method is comparable to
      other published methods . The subset selection method implemented in GAS
      has been first tested using an artificial data set -LRB- n = 100 points ;
      nu = 1 descriptor -RRB- to examine its ability to subset data points and
      second applied to analyze the XLOGP data set -LRB- n = 1831 compounds ; nu
      = 126 descriptors -RRB- . The method is able to correctly identify
      artificial data points belonging to various subsets . The analysis of the
      XLOGP data set shows that the subset selection method can be useful in
      improving a QSAR/QSPR model when the variable selection method fails
  - text: >-
      Presentation media , information complexity , and learning outcomes
      Multimedia computing provides a variety of information presentation
      modality combinations . Educators have observed that visuals enhance
      learning which suggests that multimedia presentations should be superior
      to text-only and text with static pictures in facilitating optimal human
      information processing and , therefore , comprehension . The article
      reports the findings from a 3 -LRB- text-only , overhead slides , and
      multimedia presentation -RRB- * 2 -LRB- high and low information
      complexity -RRB- factorial experiment . Subjects read a text script ,
      viewed an acetate overhead slide presentation , or viewed a multimedia
      presentation depicting the greenhouse effect -LRB- low complexity -RRB- or
      photocopier operation -LRB- high complexity -RRB- . Multimedia was
      superior to text-only and overhead slides for comprehension . Information
      complexity diminished comprehension and perceived presentation quality .
      Multimedia was able to reduce the negative impact of information
      complexity on comprehension and increase the extent of sustained attention
      to the presentation . These findings suggest that multimedia presentations
      invoke the use of both the verbal and visual working memory channels
      resulting in a reduction of the cognitive load imposed by increased
      information complexity . Moreover , multimedia superiority in facilitating
      comprehension goes beyond its ability to increase sustained attention ;
      the quality and effectiveness of information processing attained -LRB-
      i.e. , use of verbal and visual working memory -RRB- is also significant
  - text: >-
      Adaptive filtering for noise reduction in hue saturation intensity color
      space Even though the hue saturation intensity -LRB- HSI -RRB- color model
      has been widely used in color image processing and analysis , the
      conversion formulas from the RGB color model to HSI are nonlinear and
      complicated in comparison with the conversion formulas of other color
      models . When an RGB image is degraded by random Gaussian noise , this
      nonlinearity leads to a nonuniform noise distribution in HSI , making
      accurate image analysis more difficult . We have analyzed the noise
      characteristics of the HSI color model and developed an adaptive spatial
      filtering method to reduce the magnitude of noise and the nonuniformity of
      noise variance in the HSI color space . With this adaptive filtering
      method , the filter kernel for each pixel is dynamically adjusted ,
      depending on the values of intensity and saturation . In our experiments
      we have filtered the saturation and hue components and generated edge maps
      from color gradients . We have found that by using the adaptive filtering
      method , the minimum error rate in edge detection improves by
      approximately 15 %
  - text: >-
      Restoration of broadband imagery steered with a liquid-crystal optical
      phased array In many imaging applications , it is highly desirable to
      replace mechanical beam-steering components -LRB- i.e. , mirrors and
      gimbals -RRB- with a nonmechanical device . One such device is a nematic
      liquid crystal optical phased array -LRB- LCOPA -RRB- . An LCOPA can
      implement a blazed phase grating to steer the incident light . However ,
      when a phase grating is used in a broadband imaging system , two adverse
      effects can occur . First , dispersion will cause different incident
      wavelengths arriving at the same angle to be steered to different output
      angles , causing chromatic aberrations in the image plane . Second , the
      device will steer energy not only to the first diffraction order , but to
      others as well . This multiple-order effect results in multiple copies of
      the scene appearing in the image plane . We describe a digital image
      restoration technique designed to overcome these degradations . The
      proposed postprocessing technique is based on a Wiener deconvolution
      filter . The technique , however , is applicable only to scenes containing
      objects with approximately constant reflectivities over the spectral
      region of interest . Experimental results are presented to demonstrate the
      effectiveness of this technique
  - text: >-
      A comparison of computational color constancy Algorithms . II .
      Experiments with image data For pt.I see ibid. , vol . 11 , no. 9 ,
      p.972-84 -LRB- 2002 -RRB- . We test a number of the leading computational
      color constancy algorithms using a comprehensive set of images . These
      were of 33 different scenes under 11 different sources representative of
      common illumination conditions . The algorithms studied include two gray
      world methods , a version of the Retinex method , several variants of
      Forsyth 's -LRB- 1990 -RRB- gamut-mapping method , Cardei et al. 's -LRB-
      2000 -RRB- neural net method , and Finlayson et al. 's color by
      correlation method -LRB- Finlayson et al. 1997 , 2001 ; Hubel and
      Finlayson 2000 -RRB- . We discuss a number of issues in applying color
      constancy ideas to image data , and study in depth the effect of different
      preprocessing strategies . We compare the performance of the algorithms on
      image data with their performance on synthesized data . All data used for
      this study are available online at http://www.cs.sfu.ca/~color/data , and
      implementations for most of the algorithms are also available -LRB-
      http://www.cs.sfu.ca/~color/code -RRB- . Experiments with synthesized data
      -LRB- part one of this paper -RRB- suggested that the methods which
      emphasize the use of the input data statistics , specifically color by
      correlation and the neural net algorithm , are potentially the most
      effective at estimating the chromaticity of the scene illuminant .
      Unfortunately , we were unable to realize comparable performance on real
      images . Here exploiting pixel intensity proved to be more beneficial than
      exploiting the details of image chromaticity statistics , and the
      three-dimensional -LRB- 3-D -RRB- gamut-mapping algorithms gave the best
      performance
pipeline_tag: token-classification
co2_eq_emissions:
  emissions: 20.795
  source: codecarbon
  training_type: fine-tuning
  on_cloud: false
  gpu_model: 1 x NVIDIA GeForce RTX 3090
  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
  ram_total_size: 31.777088165283203
  hours_used: 0.137
model-index:
  - name: SpanMarker with bert-base-uncased on Inspec
    results:
      - task:
          type: token-classification
          name: Named Entity Recognition
        dataset:
          name: Inspec
          type: midas/inspec
          split: test
        metrics:
          - type: f1
            value: 0.5934525191548642
            name: F1
          - type: precision
            value: 0.5666149412547107
            name: Precision
          - type: recall
            value: 0.6229588106263709
            name: Recall

SpanMarker with bert-base-uncased on Inspec

This is a SpanMarker model trained on the Inspec dataset that can be used for Named Entity Recognition. This SpanMarker model uses bert-base-uncased as the underlying encoder.

Model Details

Model Description

Model Type: SpanMarker
Encoder: bert-base-uncased
Maximum Sequence Length: 256 tokens
Maximum Entity Length: 8 words
Training Dataset: Inspec
Language: en
License: apache-2.0

Model Sources

Repository: SpanMarker on GitHub
Thesis: SpanMarker For Named Entity Recognition

Model Labels

Label	Examples
KEY	"Content Atomism", "philosophy of mind", "IBS"

Evaluation

Metrics

Label	Precision	Recall	F1
all	0.5666	0.6230	0.5935
KEY	0.5666	0.6230	0.5935

Uses

Direct Use

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker_bert-base-uncased-keyphrase-inspec")
# Run inference
entities = model.predict("Adaptive filtering for noise reduction in hue saturation intensity color space Even though the hue saturation intensity -LRB- HSI -RRB- color model has been widely used in color image processing and analysis , the conversion formulas from the RGB color model to HSI are nonlinear and complicated in comparison with the conversion formulas of other color models . When an RGB image is degraded by random Gaussian noise , this nonlinearity leads to a nonuniform noise distribution in HSI , making accurate image analysis more difficult . We have analyzed the noise characteristics of the HSI color model and developed an adaptive spatial filtering method to reduce the magnitude of noise and the nonuniformity of noise variance in the HSI color space . With this adaptive filtering method , the filter kernel for each pixel is dynamically adjusted , depending on the values of intensity and saturation . In our experiments we have filtered the saturation and hue components and generated edge maps from color gradients . We have found that by using the adaptive filtering method , the minimum error rate in edge detection improves by approximately 15 %")

Downstream Use

You can finetune this model on your own dataset.

Click to expand

from span_marker import SpanMarkerModel, Trainer

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker_bert-base-uncased-keyphrase-inspec")

# Specify a Dataset with "tokens" and "ner_tag" columns
dataset = load_dataset("conll2003") # For example CoNLL2003

# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("tomaarsen/span-marker_bert-base-uncased-keyphrase-inspec-finetuned")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Sentence length	15	138.5327	557
Entities per sentence	0	8.2507	41

Training Hyperparameters

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Environmental Impact

Carbon emissions were measured using CodeCarbon.

Carbon Emitted: 0.021 kg of CO2
Hours Used: 0.137 hours

Training Hardware

On Cloud: No
GPU Model: 1 x NVIDIA GeForce RTX 3090
CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
RAM Size: 31.78 GB

Framework Versions

Python: 3.9.16
SpanMarker: 1.3.1.dev
Transformers : 4.29.2
PyTorch: 2.0.1+cu118
Datasets: 2.14.3
Tokenizers: 0.13.2