diff --git "a/data/paperswithcode_tasks.csv" "b/data/paperswithcode_tasks.csv" new file mode 100644--- /dev/null +++ "b/data/paperswithcode_tasks.csv" @@ -0,0 +1,4856 @@ +area,task,task_description +adversarial,real world adversarial attack,Adversarial attacks that are presented in the real world +adversarial,adversarial robustness,Adversarial Robustness evaluates the vulnerabilities of machine learning models under various types of adversarial attacks. +adversarial,exposure fairness, +adversarial,image to image translation,"Image-to-image translation is the task of taking images from one domain and transforming them so they have the style (or characteristics) of images from another domain. + +( Image credit: [Unpaired Image-to-Image Translation +using Cycle-Consistent Adversarial Networks](https://arxiv.org/pdf/1703.10593v6.pdf) )" +adversarial,data poisoning,"**Data Poisoning** is an adversarial attack that tries to manipulate the training dataset in order to control the prediction behavior of a trained model such that the model will label malicious examples into a desired classes (e.g., labeling spam e-mails as safe). + + +Source: [Explaining Vulnerabilities to Adversarial Machine Learning through Visual Analytics ](https://arxiv.org/abs/1907.07296)" +adversarial,website fingerprinting defense, +adversarial,dnn testing,Testing the reliability of DNNs. +adversarial,backdoor attack,"Backdoor attacks inject maliciously constructed data into a training set so that, at test time, the trained model misclassifies inputs patched with a backdoor trigger as an adversarially-desired target class." +adversarial,adversarial defense,"Competitions with currently unpublished results: + +- [TrojAI](https://pages.nist.gov/trojai/)" +adversarial,phishing website detection, +adversarial,question answering,"Question Answering is the task of answering questions (typically reading comprehension questions), but abstaining when presented with a question that cannot be answered based on the provided context. + +Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include [SQuAD](/dataset/squad), [HotPotQA](/dataset/hotpotqa), [bAbI](/dataset/babi-1), [TriviaQA](/dataset/triviaqa), [WikiQA](/dataset/wikiqa), and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet. + +( Image credit: [SQuAD](https://rajpurkar.github.io/mlx/qa-and-squad/) )" +adversarial,website fingerprinting attacks, +adversarial,design synthesis, +adversarial,model posioning, +adversarial,inference attack, +adversarial,adversarial text, +adversarial,optimize the trajectory of uav which plays a, +adversarial,adversarial attack,"An **Adversarial Attack** is a technique to find a perturbation that changes the prediction of a machine learning model. The perturbation can be very small and imperceptible to human eyes. + + +Source: [Recurrent Attention Model with Log-Polar Mapping is Robust against Adversarial Attacks ](https://arxiv.org/abs/2002.05388)" +adversarial,provable adversarial defense, +audio,environmental sound classification,Classification of Environmental Sounds. Most often sounds found in Urban environments. Task related to noise monitoring. +audio,directional hearing,Extremely low-latency audio source separation from a known direction of arrival. +audio,chord recognition, +audio,voice anti spoofing,Discriminate genuine speech and spoofing attacks +audio,timbre interpolation, +audio,audio fingerprint, +audio,acoustic scene classification,"The goal of acoustic scene classification is to classify a test recording into one of the provided predefined classes that characterizes the environment in which it was recorded. + +Source: [DCASE 2019](http://dcase.community/challenge2019/task-acoustic-scene-classification) +Source: [DCASE 2018](https://dcase.community/challenge2018/task-acoustic-scene-classification)" +audio,audio visual synchronization, +audio,audio super resolution,AUDIO SUPER-RESOLUTION or speech bandwidth extension (Upsampling Ratio = 2) +audio,audio inpainting,Filling in holes in audio data +audio,active speaker localization,"Active Speaker Localization (ASL) is the process of spatially localizing an active speaker (talker) in an environment using either audio, vision or both." +audio,language identification,Language identification is the task of determining the language of a text. +audio,zero shot multi speaker tts, +audio,music generation,Music Generation is a task of automatically generating music. +audio,audio captioning, +audio,vowel classification, +audio,streaming target sound extraction,"This task is a variant of the [Target Sound Extraction](https://paperswithcode.com/task/target-sound-extraction) task, with the constraint of causal streaming inference. Aiming for an algorithmic latency of less than 20 ms, at each time step, streaming audio models operate on an input audio chunk of length less than 20 ms. The causal constraint means that the model only has the knowledge of past chunks and no future chunks." +audio,audio tagging,"Audio tagging is a task to predict the tags of audio clips. Audio tagging tasks include music tagging, acoustic scene classification, audio event classification, etc." +audio,music compression, +audio,voice conversion,"**Voice Conversion** is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information. + + +Source: [Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet ](https://arxiv.org/abs/1903.12389)" +audio,few shot audio classification,Few-shot classification for audio signals. Presents a unique challenge compared to other few-shot domains as we deal with temporal dependencies as well +audio,self supervised sound classification, +audio,sound event localization and detection,"Given multichannel audio input, a sound event detection and localization (SELD) system outputs a temporal activation track for each of the target sound classes, along with one or more corresponding spatial trajectories when the track indicates activity. This results in a spatio-temporal characterization of the acoustic scene that can be used in a wide range of machine cognition tasks, such as inference on the type of environment, self-localization, navigation without visual input or with occluded targets, tracking of specific types of sound sources, smart-home applications, scene visualization systems, and audio surveillance, among others." +audio,bird audio detection, +audio,audio denoising, +audio,target sound extraction,"Target Sound Extraction is the task of extracting a sound corresponding to a given class from an audio mixture. The audio mixture may contain background noise with a relatively low amplitude compared to the foreground mixture components. The choice of the sound class is provided as input to the model in form of a string, integer, or a one-hot encoding of the sound class." +audio,real time directional hearing,Directional hearing models that also support real-time on-device inference +audio,audio multiple target classification, +audio,pitch control, +audio,audio generation,"Audio generation (synthesis) is the task of generating raw audio such as speech. + +( Image credit: [MelNet](https://arxiv.org/pdf/1906.01083v1.pdf) )" +audio,audio declipping,Audio declipping is the task of estimating the original audio signal given its clipped measurements. +audio,direction of arrival estimation,Estimating the direction-of-arrival (DOA) of a sound source from multi-channel recordings. +audio,speech editing,"I first learned this story from my third choice, ie, my teacher who I used to call master. That was supposed to be a life- changing tale for me because I was very stubborn and unreceptive back then. But, my master taught me to be more open with new perspectives and continue to seek inspirations from other people who I can call masters, too, and to absorb and just filter later. As Bruce Lee said. ""Absorb what is useful"" Hopefully, after have taken everything in, I will have evolved into a better educator Just like my master and ultimately, a better creative person want to reach that ""zen point where everything is intuitive and instinctive, where teaching and are one (like the samural and the sword are one), where I can see beyond what my eyes tell me as what swordsman Miyamoto Musashi said. + +Yes. I am aware of the dangers of having too many masters. But mixed martial arts taught us that we can learn different fighting styles from different masters, and eventually, evolve into a well-rounded warrior. I guess the secret lies in keeping an open mind. I learned that from my master. So, just make sure that when meet other people and listen to their stories, go with an empty cup. + +Nevertheless, she left me. Again, it broke my heart. + +Right after signed on my journal entry, Theard euphonous voices of these three personalities fused into one calling my name. It was my mom She came in to my room with two pieces of cake each shaped with letters P and Jenough to be carried by her hands. The letters are initials of name- Philippe John. Planted on the edge my first of each cake were five tiny well-lit candles. I stood from my post, grabbed the pieces from my mom's shaky hands, and put them on my desk. Then, I hugged her it was one of the tightest hugs had given her. And, she told me ""You're now a decade young teacher. Way to go, my love, and promise I will not leave you anymore. Never"" + +I couldn't thank her more. May 15 of this year, woke up with a happy heart. And. again. thought to myself, ""when reach 50 years old, 60 or beyond, I will look back to this day again and again and again." +audio,bird classification, +audio,single label target sound extraction,"Single-Label Target Sound Extraction is the task of extracting a given class of sounds from an audio mixture. The audio mixture may contain background noise with a relatively low amplitude compared to the foreground mixture components. The choice of the sound class is provided as input to the model in form of a string, integer, or a one-hot encoding of the sound class." +audio,audio signal processing,"This is a general task that covers transforming audio inputs into audio outputs, not limited to existing PaperWithCode categories of Source Separation, Denoising, Classification, Recognition, etc." +audio,sound event detection,"**Sound Event Detection** (SED) is the task of recognizing the sound events and their respective temporal start and end time in a recording. Sound events in real life do not always occur in isolation, but tend to considerably overlap with each other. Recognizing such overlapping sound events is referred as polyphonic SED. + + +Source: [A report on sound event detection with different binaural features ](https://arxiv.org/abs/1710.02997)" +audio,audio effects modeling,"Modeling of audio effects such as reverberation, compression, distortion, etc." +audio,bandwidth extension,Bandwidth extension is the task of expanding the bandwidth of a signal in a way that approximates the original or desired higher bandwidth signal. +audio,target speaker extraction,Extract the dialogue content of the specified target in a multi-person dialogue. +audio,speaker orientation,Direction of Voice or speaker orientation of the person with respect to the target device. +audio,audio signal recognition, +audio,acoustic novelty detection,"Detect novel events given acoustic signals, either in domestic or outdoor environments." +audio,shooter localization,Shooter localization based on videos. +audio,bird species classification with audio visual, +audio,synthetic speech detection,Detect fake synthetic speech generated using machine learning +audio,gunshot detection, +audio,fake voice detection, +audio,audio classification,Audio classification or audio tagging are tasks to predict the tags of audio clips. +audio,inference optimization, +audio,audio source separation,"**Audio Source Separation** is the process of separating a mixture (e.g. a pop band recording) into isolated sounds from individual sources (e.g. just the lead vocals). + + +Source: [Model selection for deep audio source separation via clustering analysis ](https://arxiv.org/abs/1910.12626)" +audio,audio dequantization,Audio Dequantization is a process of estimating the original signal from its quantized counterpart. +computer-code,value prediction, +computer-code,program induction,Generating program code for domain-specific tasks +computer-code,neural network simulation,Simulation of abstract or biophysical neural networks in silico +computer-code,learning to execute, +computer-code,chart question answering,Question Answering task on charts images +computer-code,paraphrase generation,"Paraphrase Generation involves transforming a natural language sentence to a new sentence, that has the same semantic meaning but a different syntactic or lexical surface form." +computer-code,write computer programs from specifications, +computer-code,programming error detection, +computer-code,nature inspired optimization algorithm, +computer-code,wrong binary operator, +computer-code,enumerative search, +computer-code,fault localization, +computer-code,sql chatbots, +computer-code,recommendation systems,"The recommendation systems task is to produce a list of recommendations for a user. The most common methods used in recommender systems are factor models (Koren et al., 2009; Weimer et al., 2007; Hidasi & Tikk, 2012) and neighborhood methods (Sarwar et al., 2001; Koren, 2008). +Factor models work by decomposing the sparse user-item interactions matrix to a set of d dimensional vectors one for each item and user in the dataset. Factor models are hard to apply in session-based recommendations due to the absence of a user profile. On the other hand, neighborhood methods, which rely on computing similarities between items (or users) are based on co-occurrences of items in sessions (or user profiles). Neighborhood methods have been used extensively in session-based recommendations. + +( Image credit: [CuMF_SGD](https://arxiv.org/pdf/1610.05838v3.pdf) )" +computer-code,text to sql,"( Image credit: [SyntaxSQLNet](https://arxiv.org/pdf/1810.05237v2.pdf) )" +computer-code,spectral efficiency analysis of uplink,Code for Spectral Efficiency Analysis of Uplink-Downlink Decoupled Access in C-V2X Networks +computer-code,code classification, +computer-code,edit script generation,"Generating edit scripts by comparing 2 different files or strings to convert one to another. this script will contain instruction like insert, delete and substitute." +computer-code,code generation,"**Code Generation** is an important field to predict explicit code or program structure from multimodal data sources such as incomplete code, programs in another programming language, natural language descriptions or execution examples. Code Generation tools can assist the development of automatic programming tools to improve programming productivity. + + +Source: [Deep Learning for Source Code Modeling and Generation ](https://arxiv.org/abs/2002.05442) + +Image source: [Measuring Coding Challenge Competence With APPS](https://paperswithcode.com/paper/measuring-coding-challenge-competence-with)" +computer-code,sentinel 1 sar processing, +computer-code,file difference,"Generate edit script comparing 2 strings or files, which contains instruction of insert, delete and substitute to convert first string to the second." +computer-code,sql synthesis, +computer-code,motion style transfer, +computer-code,text to code generation, +computer-code,codesearchnet java, +computer-code,code summarization,"**Code Summarization** is a task that tries to comprehend code and automatically generate descriptions directly from the source code. + + +Source: [Improving Automatic Source Code Summarization via Deep Reinforcement Learning ](https://arxiv.org/abs/1811.07234)" +computer-code,exception type, +computer-code,swapped operands, +computer-code,api sequence recommendation, +computer-code,webcam rgb image classification, +computer-code,video defect classification,"Quick-View (QV) Inspection is one commonly-used technology. However, it is quite labor-intensive to find defects from a huge number of QV videos. To tackle this problem, we propose a video defect classification task, which is to predict the categories of pipe +defects in a short QV video." +computer-code,annotated code search,Annotated code search is the retrieval of code snippets paired with brief descriptions of their intent using natural language queries. +computer-code,contextual embedding for source code, +computer-code,program repair,Task of teaching ML models to modify an existing program to fix a bug in a given code. +computer-code,sparse subspace based clustering, +computer-code,infinite image generation, +computer-code,sql to text,"( Image credit: [SQL-to-Text Generation with Graph-to-Sequence Model](https://arxiv.org/pdf/1809.05255v2.pdf) )" +computer-code,log parsing, +computer-code,function docstring mismatch, +computer-code,low rank compression, +computer-code,formalize foundations of universal algebra in, +computer-code,code comment generation, +computer-code,variable misuse, +computer-code,single image portrait relighting, +computer-code,tiling deployment,Data tiling over 3 memory hierarchy levels and deployment on microcontroller. +computer-code,code search,"The goal of **Code Search** is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language. + + +Source: [When Deep Learning Met Code Search ](https://arxiv.org/abs/1905.03813)" +computer-code,editcompletion,"Given a code snippet that is partially edited, the goal is to predict a completion of the edit for the rest of the snippet." +computer-code,program synthesis, +computer-code,git commit message generation, +computer-code,type prediction, +computer-vision,retinal oct disease classification,Classifying different Retinal degeneration from Optical Coherence Tomography Images (OCT). +computer-vision,crowds, +computer-vision,multi object tracking and segmentation,"Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. + +(Image and definition credit: [Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation](https://github.com/SysCV/pcan), NeurIPS 2021, Spotlight )" +computer-vision,text guided image editing,Editing images using text prompts. +computer-vision,rotated mnist, +computer-vision,lip sync 1,"This task deals with lip-syncing a video (or) an image to the desired target speech. Approaches in this task work only for a specific (limited set) of identities, languages, speech/voice. See also: Unconstrained lip-synchronization - https://paperswithcode.com/task/lip-sync" +computer-vision,raw reconstruction,Reconstruct RAW camera sensor readings from the corresponding sRGB images +computer-vision,part based representation learning, +computer-vision,trajectory forecasting,"Trajectory forecasting is a sequential prediction task, where a forecasting model predicts future trajectories of all moving agents (humans, vehicles, etc.) in a scene, based on their past trajectories and/or the scene context. + +(Illustrative figure from [Social NCE: Contrastive Learning of Socially-aware Motion Representations](https://github.com/vita-epfl/social-nce))" +computer-vision,camera localization, +computer-vision,cross domain activity recognition, +computer-vision,highlight removal, +computer-vision,egocentric pose estimation, +computer-vision,fingertip detection, +computer-vision,amodal layout estimation,"Amodal scene layout estimation involves estimating the static and dynamic portion of an urban driving scene in bird's-eye view, given a single image. The concept of ""amodal"" estimation refers to the fact that we also estimate layout of parts of the scene that are not observable in the image." +computer-vision,indoor scene reconstruction, +computer-vision,motion detection in non stationary scenes, +computer-vision,object detection, +computer-vision,point cloud segmentation,"3D point cloud segmentation is the process of +classifying point clouds into multiple homogeneous regions, the +points in the same region will have the same properties. The +segmentation is challenging because of high redundancy, uneven +sampling density, and lack explicit structure of point cloud +data. This problem has many applications in robotics such as +intelligent vehicles, autonomous mapping and navigation. + +Source: [3D point cloud segmentation: A survey](https://doi.org/10.1109/RAM.2013.6758588)" +computer-vision,multi label classification,"**Multi-Label Classification** is the supervised learning problem where an instance may be associated with multiple labels. This is an extension of single-label classification (i.e., multi-class, or binary) where each instance is only associated with a single class label. + + +Source: [Deep Learning for Multi-label Classification ](https://arxiv.org/abs/1502.05988)" +computer-vision,style generalization, +computer-vision,point cloud linear classification,Training a linear classifier(e.g. SVM) on the embeddings/representations of 3D point clouds. The embeddings/representations are usually trained in an unsupervised manner. +computer-vision,semantic segmentation, +computer-vision,gesture to gesture translation, +computer-vision,transform a video into a comics, +computer-vision,stereo depth estimation, +computer-vision,short term object interaction anticipation, +computer-vision,multi person mesh recovery, +computer-vision,drone view target localization,"(Drone -> Satellite) Given one drone-view image or video, the task aims to find the most similar satellite-view image to localize the target building in the satellite view." +computer-vision,unbiased scene graph generation,"Unbiased Scene Graph Generation (Unbiased SGG) aims to predict more informative scene graphs composed of more ""tail predicates"" *(in contrast to ""head predicates"" in terms of class frequencies) by dealing with the skewed, long-tailed predicate class distribution. (Definition from Chiou et al. ""Recovering the Unbiased Scene Graphs from the Biased Ones"")" +computer-vision,object discovery in videos, +computer-vision,pedestrian density estimation,Pedestrian density estimation is the task of estimating the density of pedestrians from cameras. +computer-vision,multi modal subspace clustering, +computer-vision,hyperspectral unmixing,"**Hyperspectral Unmixing** is a procedure that decomposes the measured pixel spectrum of hyperspectral data into a collection of constituent spectral signatures (or endmembers) and a set of corresponding fractional abundances. Hyperspectral Unmixing techniques have been widely used for a variety of applications, such as mineral mapping and land-cover change detection. + + +Source: [An Augmented Linear Mixing Model to Address Spectral Variability for Hyperspectral Unmixing ](https://arxiv.org/abs/1810.12000)" +computer-vision,prompt driven zero shot domain adaptation,Domain adaptation using only a single source domain and a description of the target domain in natural language (No images from target domain are available) +computer-vision,autonomous navigation,"Autonomous navigation is the task of autonomously navigating a vehicle or robot to or around a location without human guidance. + +( Image credit: [Approximate LSTMs for Time-Constrained Inference: +Enabling Fast Reaction in Self-Driving Cars](https://arxiv.org/pdf/1905.00689v2.pdf) )" +computer-vision,finger vein recognition, +computer-vision,local color enhancement,"Enhancement techniques for improving the contrast between lesion and background skin on dermatological macro-images are limited in the literature. To fill this gap, a modified sigmoid transform is applied in the HSV color space. The crossover point in the modified sigmoid transform that divides the macro-image into lesion and background is predicted using a modified EfficientNet regressor to exclude manual intervention and subjectivity." +computer-vision,human interaction recognition, +computer-vision,pose contrastive learning, +computer-vision,facial beauty prediction,"Facial beauty prediction is the task of predicting the attractiveness of a face. + +( Image credit: [SCUT-FBP5500: A Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction](https://github.com/HCIILAB/SCUT-FBP5500-Database-Release) )" +computer-vision,action unit detection,"Action unit detection is the task of detecting action units from a video - for example, types of facial action units (lip tightening, cheek raising) from a video of a face. + +( Image credit: [AU R-CNN](https://arxiv.org/pdf/1812.05788v2.pdf) )" +computer-vision,offline handwritten chinese character,Handwritten Chinese characters recognition is the task of detecting and interpreting the components of Chinese characters (i.e. radicals and two-dimensional structures). +computer-vision,face age editing, +computer-vision,image registration,"Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. It is used in computer vision, medical imaging, and compiling and analyzing images and data from satellites. Registration is necessary in order to be able to compare or integrate the data obtained from these different measurements. + +Source: [Image registration | Wikipedia](https://en.wikipedia.org/wiki/Image_registration) + +( Image credit: [Kornia](https://github.com/kornia/kornia) )" +computer-vision,skills evaluation, +computer-vision,explainable models, +computer-vision,video based person re identification,Video-based person re-identification (reID) aims to retrieve person videos with the same identity as a query person across multiple cameras +computer-vision,video individual counting, +computer-vision,face image retrieval,"Face image retrieval is the task of retrieving faces similar to a query, according to the given +criteria (e.g. identity) and rank them using their distances to the query. + +( Image credit: [CP-mtML](http://openaccess.thecvf.com/content_cvpr_2016/papers/Bhattarai_CP-mtML_Coupled_Projection_CVPR_2016_paper.pdf) )" +computer-vision,open set video captioning, +computer-vision,multi object tracking,Multiple Object Tracking is the problem of automatically identifying multiple objects in a video and representing them as a set of trajectories with high accuracy. +computer-vision,human pose estimation,3D Human Pose Estimation is a task of estimating the 3D pose of a human from a 2D image. +computer-vision,object skeleton detection,"Object skeleton detection is the task of detecting the skeleton of an object in an image. + +( Image credit: [DeepFlux for Skeletons in the Wild](https://arxiv.org/pdf/1811.12608v1.pdf) )" +computer-vision,ad hoc video search,"The Ad-hoc search task ended a 3 year cycle from 2016-2018 with a goal to model the end user search use-case, who is searching (using textual sentence queries) for segments of video containing persons, objects, activities, locations, etc. and combinations of the former. While the Internet Archive (IACC.3) dataset was adopted between 2016 to 2018, starting in 2019 a new data collection based on Vimeo Creative Commons (V3C) will be adopted to support the task for at least 3 more years. + +Given the test collection (V3C1 or IACC.3), master shot boundary reference, and set of Ad-hoc queries (approx. 30 queries) released by NIST, return for each query a list of at most 1000 shot IDs from the test collection ranked according to their likelihood of containing the target query." +computer-vision,instance segmentation 1,Image: [OccuSeg](https://arxiv.org/pdf/2003.06537v3.pdf) +computer-vision,kiss detection, +computer-vision,visual crowd analysis, +computer-vision,layout to image generation,"Layout-to-image generation its the task to generate a scene based on the given layout. The layout describes the location of the objects to be included in the output image. +In this section, you can find state-of-the-art leaderboards for Layout-to-image generation." +computer-vision,semi supervised anomaly detection, +computer-vision,drone navigation,"(Satellite -> Drone) Given one satellite-view image, the drone intends to find the most relevant place (drone-view images) that it has passed by. According to its flight history, the drone could be navigated back to the target place." +computer-vision,direct transfer person re identification, +computer-vision,finger dorsal image spoof detection, +computer-vision,materials imaging, +computer-vision,amodal tracking, +computer-vision,learning with noisy labels,"Learning with noisy labels means When we say ""noisy labels,"" we mean that an adversary has intentionally messed up the labels, which would have come from a ""clean"" distribution otherwise. This setting can also be used to cast learning from only positive and unlabeled data." +computer-vision,semi supervised fashion compatibility, +computer-vision,video correspondence flow, +computer-vision,t1w mri classification, +computer-vision,robust face recognition,"Robust face recognition is the task of performing recognition in an unconstrained environment, where there is variation of view-point, scale, pose, illumination and expression of the face images. + +( Image credit: [MeGlass dataset](https://github.com/cleardusk/MeGlass) )" +computer-vision,referring video object segmentation,"Referring video object segmentation aims at segmenting an object in video with language expressions. Unlike the previous video object segmentation, the task exploits a different type of supervision, language expressions, to identify and segment an object referred by the given language expressions in a video." +computer-vision,detecting image manipulation, +computer-vision,class agnostic object detection,Class-agnostic object detection aims to localize objects in images without specifying their categories. +computer-vision,curved text detection, +computer-vision,virtual try on,Virtual try-on of clothing or other items such as glasses and makeup. Most recent techniques use Generative Adversarial Networks. +computer-vision,robust object detection,"A Benchmark for the: +Robustness of Object Detection Models to Image Corruptions and Distortions + +To allow fair comparison of robustness enhancing methods all models have to use a standard ResNet50 backbone because performance strongly scales with backbone capacity. If requested an unrestricted category can be added later. + +Benchmark Homepage: https://github.com/bethgelab/robust-detection-benchmark + + +Metrics: + +mPC [AP]: Mean Performance under Corruption [measured in AP] + +rPC [%]: Relative Performance under Corruption [measured in %] + +Test sets: +Coco: val 2017; Pascal VOC: test 2007; Cityscapes: val; + +( Image credit: [Benchmarking Robustness in Object Detection](https://arxiv.org/pdf/1907.07484v1.pdf) )" +computer-vision,fine grained image recognition, +computer-vision,multimodal activity recognition, +computer-vision,body mass index bmi prediction, +computer-vision,thermal image segmentation, +computer-vision,real time instance segmentation,"Similar to its parent task, instance segmentation, but with the goal of achieving real-time capabilities under a defined setting. + +Image Credit: [SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation](https://arxiv.org/pdf/2007.14772v1.pdf)" +computer-vision,lipreading,"Lipreading is a process of extracting speech by watching lip movements of a speaker in the absence of sound. Humans lipread all the time without even noticing. It is a big part in communication albeit not as dominant as audio. It is a very helpful skill to learn especially for those who are hard of hearing. + +Deep Lipreading is the process of extracting speech from a video of a silent talking face using deep neural networks. It is also known by few other names: Visual Speech Recognition (VSR), Machine Lipreading, Automatic Lipreading etc. + +The primary methodology involves two stages: i) Extracting visual and temporal features from a sequence of image frames from a silent talking video ii) Processing the sequence of features into units of speech e.g. characters, words, phrases etc. We can find several implementations of this methodology either done in two separate stages or trained end-to-end in one go." +computer-vision,viewpoint estimation, +computer-vision,supervised video summarization,"**Supervised video summarization** rely on datasets with human-labeled ground-truth annotations (either in the form of video summaries, as in the case of the [SumMe](https://paperswithcode.com/dataset/summe) dataset, or in the form of frame-level importance scores, as in the case of the [TVSum](https://paperswithcode.com/dataset/tvsum-1) dataset), based on which they try to discover the underlying criterion for video frame/fragment selection and video summarization. + +Source: [Video Summarization Using Deep Neural Networks: A Survey](https://arxiv.org/abs/2101.06072)" +computer-vision,spatial token mixer,Spatial Token Mixer (STM) is a module for vision transformers that aims to improve the efficiency of token mixing. STM is a type of depthwise convolution that operates on the spatial dimension of the tokens. STM is a drop-in replacement for the token mixing layers in vision transformers. +computer-vision,video generation from a single image,"( Image credit: [Logacheva et al.](https://paperswithcode.com/paper/deeplandscape-adversarial-modeling-of-1) )" +computer-vision,pulmorary vessel segmentation, +computer-vision,partial point cloud matching, +computer-vision,steering control, +computer-vision,multimodal emotion recognition,"This is a leaderboard for multimodal emotion recognition on the IEMOCAP dataset. The modality abbreviations are +A: Acoustic +T: Text +V: Visual + +Please include the modality in the bracket after the model name. + +All models must use standard five emotion categories and are evaluated in standard leave-one-session-out (LOSO). See the papers for references." +computer-vision,no reference image quality assessment,An Image Quality Assessment approach where no reference image information is available to the model. +computer-vision,object detection in indoor scenes,"Object detection in indoor scenes is the task of performing object detection within an indoor environment. + + +( Image credit: [Faster Bounding Box Annotation for Object Detection in Indoor Scenes](https://arxiv.org/pdf/1807.03142v1.pdf) )" +computer-vision,self supervised image classification,"This is the task of image classification using representations learnt with self-supervised learning. Self-supervised methods generally involve a pretext task that is solved to learn a good representation and a loss function to learn with. One example of a loss function is an autoencoder based loss where the goal is reconstruction of an image pixel-by-pixel. A more popular recent example is a contrastive loss, which measure the similarity of sample pairs in a representation space, and where there can be a varying target instead of a fixed target to reconstruct (as in the case of autoencoders). + +A common evaluation protocol is to train a linear classifier on top of (frozen) representations learnt by self-supervised methods. The leaderboards for the linear evaluation protocol can be found below. In practice, it is more common to fine-tune features on a downstream task. An alternative evaluation protocol therefore uses semi-supervised learning and finetunes on a % of the labels. The leaderboards for the finetuning protocol can be accessed [here](https://paperswithcode.com/task/semi-supervised-image-classification). + +You may want to read some blog posts before reading the papers and checking the leaderboards: + +- [Contrastive Self-Supervised Learning](https://ankeshanand.com/blog/2020/01/26/contrative-self-supervised-learning.html) - Ankesh Anand +- [The Illustrated Self-Supervised Learning](https://amitness.com/2020/02/illustrated-self-supervised-learning/) - Amit Chaudhary +- [Self-supervised learning and computer vision](https://www.fast.ai/2020/01/13/self_supervised/) - Jeremy Howard +- [Self-Supervised Representation Learning](https://lilianweng.github.io/lil-log/2019/11/10/self-supervised-learning.html) - Lilian Weng + +There is also Yann LeCun's talk at AAAI-20 which you can watch [here](https://vimeo.com/390347111) (35:00+). + +( Image credit: [A Simple Framework for Contrastive Learning of Visual Representations](https://arxiv.org/pdf/2002.05709v1.pdf) )" +computer-vision,point cloud classification,Image: [Qi et al](https://arxiv.org/pdf/1612.00593v2.pdf) +computer-vision,markerless motion capture, +computer-vision,hand segmentation, +computer-vision,fine grained visual categorization, +computer-vision,phrase extraction and grounding peg,PEG requires a model to extract phrases from text and locate objects from images simultaneously. +computer-vision,video alignment, +computer-vision,audio visual synchronization, +computer-vision,vgsi,"Given a textual goal and multiple images representing candidate events, a model must choose one image which constitutes a reason- able step towards the given goal. +A model should correctly recognize not only the specific action illustrated in an image (e.g., “turning on the oven”), but also the intent of the action (“baking fish”)." +computer-vision,hindi image captioning,The main goal of this task is to generate a caption for an input image in a native langugae Hindi. +computer-vision,video retrieval,"The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics." +computer-vision,face alignment,"Face alignment is the task of identifying the geometric structure of faces in digital images, and attempting to obtain a canonical alignment of the face based on translation, scale, and rotation. + +( Image credit: [3DDFA_V2](https://github.com/cleardusk/3DDFA_V2) )" +computer-vision,food recognition, +computer-vision,prostate segmentation, +computer-vision,one shot visual object segmentation, +computer-vision,face clustering,Face Clustering in the videos +computer-vision,video emotion recognition, +computer-vision,vnla,Find objects in photorealistic environments by requesting and executing language subgoals. +computer-vision,missing markers reconstruction,Reconstructing missing markers in the motion caption 3d poses +computer-vision,burst image super resolution,"Reconstruct a high-resolution image from a set of low-quality images, very like the multi-frame super-resolution task." +computer-vision,neural rendering,"Given a representation of a 3D scene of some kind (point cloud, mesh, voxels, etc.), the task is to create an algorithm that can produce photorealistic renderings of this scene from an arbitrary viewpoint. Sometimes, the task is accompanied by image/scene appearance manipulation." +computer-vision,fundus to angiography generation,Generating Retinal Fluorescein Angiography from Retinal Fundus Image using Generative Adversarial Networks. +computer-vision,image variation,"Given an image, generate variations of the image" +computer-vision,semi supervised change detection, +computer-vision,aerial video semantic segmentation, +computer-vision,human object interaction detection,"Human-Object Interaction (HOI) detection is a task of identifying ""a set of interactions"" in an image, which involves the i) localization of the subject (i.e., humans) and target (i.e., objects) of interaction, and ii) the classification of the interaction labels." +computer-vision,morphology classification, +computer-vision,monocular human pose estimation,This task targets at 3D human pose estimation with a single RGB camera. +computer-vision,physical video anomaly detection,Detecting if an entire short clip of a physical or mechanical process features an anomalous motion +computer-vision,probabilistic deep learning, +computer-vision,hand object pose,6D pose estimation of hand and object +computer-vision,temporal action proposal generation, +computer-vision,intrinsic image decomposition,"**Intrinsic Image Decomposition** is the process of separating an image into its formation components such as reflectance (albedo) and shading (illumination). Reflectance is the color of the object, invariant to camera viewpoint and illumination conditions, whereas shading, dependent on camera viewpoint and object geometry, consists of different illumination effects, such as shadows, shading and inter-reflections. Using intrinsic images, instead of the original images, can be beneficial for many computer vision algorithms. For instance, for shape-from-shading algorithms, the shading images contain important visual cues to recover geometry, while for segmentation and detection algorithms, reflectance images can be beneficial as they are independent of confounding illumination effects. Furthermore, intrinsic images are used in a wide range of computational photography applications, such as material recoloring, relighting, retexturing and stylization. + + +Source: [CNN based Learning using Reflection and Retinex Models for Intrinsic Image Decomposition ](https://arxiv.org/abs/1712.01056)" +computer-vision,reflection removal, +computer-vision,video classification,"**Video Classification** is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video. + + +Source: [Efficient Large Scale Video Classification ](https://arxiv.org/abs/1505.06250)" +computer-vision,visual text correction, +computer-vision,medical object detection,"Medical object detection is the task of identifying medical-based objects within an image. + +( Image credit: [Liver Lesion Detection from Weakly-labeled Multi-phase CT Volumes with a Grouped Single Shot MultiBox Detector](https://github.com/L0SG/grouped-ssd-pytorch) )" +computer-vision,self supervised anomaly detection,Self-Supervision towards anomaly detection +computer-vision,point cloud reconstruction,Encoding and reconstruction of 3D point clouds. +computer-vision,image to image translation,"Image-to-image translation is the task of taking images from one domain and transforming them so they have the style (or characteristics) of images from another domain. + +( Image credit: [Unpaired Image-to-Image Translation +using Cycle-Consistent Adversarial Networks](https://arxiv.org/pdf/1703.10593v6.pdf) )" +computer-vision,object detection,"Object detection is the task of detecting instances of objects of a certain class within an image. The state-of-the-art methods can be categorized into two main types: one-stage methods and two stage-methods. One-stage methods prioritize inference speed, and example models include YOLO, SSD and RetinaNet. Two-stage methods prioritize detection accuracy, and example models include Faster R-CNN, Mask R-CNN and Cascade R-CNN. + +The most popular benchmark is the MSCOCO dataset. Models are typically evaluated according to a Mean Average Precision metric. + +( Image credit: [Detectron](https://github.com/facebookresearch/detectron) )" +computer-vision,colorization,"**Colorization** is the process of adding plausible color information to monochrome photographs or videos. Colorization is a highly undetermined problem, requiring mapping a real-valued luminance image to a three-dimensional color-valued one, that has not a unique solution. + + +Source: [ChromaGAN: An Adversarial Approach for Picture Colorization ](https://arxiv.org/abs/1907.09837)" +computer-vision,visual relationship detection,"Visual relationship detection (VRD) is one newly developed computer vision task aiming to recognize relations or interactions between objects in an image. It is a further learning task after object recognition and is essential for fully understanding images, even the visual world." +computer-vision,predict future video frames, +computer-vision,thermal image denoising, +computer-vision,earthquake prediction, +computer-vision,object reconstruction from a single image,Image: [Fan et al](https://arxiv.org/pdf/1612.00603v2.pdf) +computer-vision,activity recognition in videos, +computer-vision,spectral super resolution, +computer-vision,semantic scene completion from a single,This task relies on a single RGB image to infer the dense 3D voxelized semantic scene. +computer-vision,point cloud matching,Image: [Gojic et al](https://openaccess.thecvf.com/content_CVPR_2019/papers/Gojcic_The_Perfect_Match_3D_Point_Cloud_Matching_With_Smoothed_Densities_CVPR_2019_paper.pdf) +computer-vision,visibility estimation from point cloud,"Estimate the point-wise visibility of each point from a given point of view (a point, or a view frustum)." +computer-vision,sparse representation based classification,Sparse Representation-based Classification is the task based on the description of the data as a linear combination of few building blocks - atoms - taken from a pre-defined dictionary of such fundamental elements. +computer-vision,data free quantization,"**Data Free Quantization** is a technique to achieve a highly accurate quantized model without accessing any training data. + +Source: [Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples](https://arxiv.org/abs/2111.02625)" +computer-vision,multi view shape retrieval, +computer-vision,video saliency detection, +computer-vision,explainable artificial intelligence,"XAI refers to methods and techniques in the application of artificial intelligence (AI) such that the results of the solution can be understood by humans. It contrasts with the concept of the ""black box"" in machine learning where even its designers cannot explain why an AI arrived at a specific decision. XAI may be an implementation of the social right to explanation. XAI is relevant even if there is no legal right or regulatory requirement—for example, XAI can improve the user experience of a product or service by helping end users trust that the AI is making good decisions. This way the aim of XAI is to explain what has been done, what is done right now, what will be done next and unveil the information the actions are based on. These characteristics make it possible (i) to confirm existing knowledge (ii) to challenge existing knowledge and (iii) to generate new assumptions." +computer-vision,video restoration, +computer-vision,video grounding, +computer-vision,junction detection, +computer-vision,shape retrieval,Image: [Sun et al](https://arxiv.org/pdf/1804.04610v1.pdf) +computer-vision,video, +computer-vision,fish detection, +computer-vision,object detection from monocular images,"This is the task of detecting 3D objects from monocular images (as opposed to LiDAR based counterparts). It is usually associated with autonomous driving based tasks. + +( Image credit: [Orthographic Feature Transform for Monocular 3D Object Detection](https://arxiv.org/pdf/1811.08188v1.pdf) )" +computer-vision,holography,"The images that are presented here are multiplanar images that were reconstructed using a holographic display. For more details, please see: https://complightlab.com/publications/realistic_defocus_cgh/" +computer-vision,pornography detection, +computer-vision,object retrieval,Source: [He et al](https://arxiv.org/pdf/1803.06189v1.pdf) +computer-vision,vqa, +computer-vision,fine grained image classification,"The Fine-Grained Image Classification task focuses on differentiating between hard-to-distinguish object classes, such as species of birds, flowers, or animals; and identifying the makes or models of vehicles. + +( Image credit: [Looking for the Devil in the Details](https://arxiv.org/pdf/1903.06150v2.pdf) )" +computer-vision,prostate zones segmentation, +computer-vision,facial expression recognition,"Facial expression recognition is the task of classifying the expressions on face images into various categories such as anger, fear, surprise, sadness, happiness and so on. + +( Image credit: [DeXpression](https://arxiv.org/pdf/1509.05371v2.pdf) )" +computer-vision,source free domain adaptation, +computer-vision,temporal metadata manipulation detection,Detecting when the timestamp of an outdoor photograph has been manipulated +computer-vision,serial style transfer, +computer-vision,motion detection,"**Motion Detection** is a process to detect the presence of any moving entity in an area of interest. Motion Detection is of great importance due to its application in various areas such as surveillance and security, smart homes, and health monitoring. + + +Source: [Different Approaches for Human Activity Recognition– A Survey ](https://arxiv.org/abs/1906.05074)" +computer-vision,open set action recognition, +computer-vision,image augmentation,"**Image Augmentation** is a data augmentation method that generates more training data from the existing training samples. Image Augmentation is especially useful in domains where training data is limited or expensive to obtain like in biomedical applications. + +Source: [Improved Image Augmentation for Convolutional Neural Networks by Copyout and CopyPairing ](https://arxiv.org/abs/1909.00390) + +( Image credit: [Kornia](https://github.com/kornia/kornia) )" +computer-vision,hand gesture recognition 1, +computer-vision,semantic slam,SLAM with semantic level scene understanding +computer-vision,action recognition in videos,"Human action recognition has become an active research area in recent years, as it plays a significant role in video +understanding. In general, human action can be recognized from multiple modalities, such as appearance, depth, optical flows, and body skeletons. + +In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos. + +Please note some benchmarks may be located in the [Action Classification](https://paperswithcode.com/task/action-classification) or [Video Classification](https://paperswithcode.com/task/video-classification) tasks, e.g. Kinetics-400." +computer-vision,weakly supervised action localization,"In this task, the training data consists of videos with a list of activities in them without any temporal boundary annotations. However, while testing, given a video, the algorithm should recognize the activities in the video and also provide the start and end time." +computer-vision,point set upsampling, +computer-vision,pso convnets dynamics 2,Incorporating distilled Cucker-Smale elements into PSO algorithm using KNN and intertwine training with SGD (Pull back method) +computer-vision,negative flip rate, +computer-vision,image stylization, +computer-vision,continual semantic segmentation,Continual learning in semantic segmentation. +computer-vision,semi supervised person instance segmentation, +computer-vision,image retrieval,"Image retrieval systems aim to find similar images to a query image among an image dataset. + +( Image credit: [DELF](https://github.com/tensorflow/models/tree/master/research/delf) )" +computer-vision,image to point cloud registration,"Given a query image and a scene of point cloud, get the camera pose according to them." +computer-vision,texture classification,"**Texture Classification** is a fundamental issue in computer vision and image processing, playing a significant role in many applications such as medical image analysis, remote sensing, object recognition, document analysis, environment modeling, content-based image retrieval and many more. + + +Source: [Improving Texture Categorization with Biologically Inspired Filtering ](https://arxiv.org/abs/1312.0072)" +computer-vision,video background subtraction, +computer-vision,composite action recognition, +computer-vision,point cloud classification,Point Cloud Classification is a task involving the classification of unordered 3D point sets (point clouds). +computer-vision,detecting shadows, +computer-vision,solar cell segmentation, +computer-vision,spoof detection, +computer-vision,action detection, +computer-vision,weakly supervised semantic segmentation,"The semantic segmentation task is to assign a label from a label set to each pixel in an image. In the case of fully supervised setting, the dataset consists of images and their corresponding +pixel-level class-specific annotations (expensive pixel-level annotations). However, in the +weakly-supervised setting, the dataset consists of images and corresponding annotations that +are relatively easy to obtain, such as tags/labels of objects present in the image. + +( Image credit: [Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing](http://openaccess.thecvf.com/content_cvpr_2018/papers/Huang_Weakly-Supervised_Semantic_Segmentation_CVPR_2018_paper.pdf) )" +computer-vision,multiple affordance detection,"Affordance detection is the task of detecting objects that are usable (or graspable) by a human. + +( Image credit: [What can I do here? Leveraging Deep 3D saliency and geometry for fast and scalable multiple affordance detection](https://github.com/eduard626/deep-interaction-tensor) )" +computer-vision,mri segmentation, +computer-vision,semantic segmentation, +computer-vision,open world semi supervised learning, +computer-vision,reference based super resolution,Reference-based Super-Resolution aims to recover high-resolution images by utilizing external reference images containing similar content to generate rich textures. +computer-vision,text spotting,The ability to read text in natural scenes +computer-vision,multiple object track and segmentation,"Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes. + +(Image and definition credit: [Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation](https://github.com/SysCV/pcan), NeurIPS 2021, Spotlight )" +computer-vision,video prediction,"**Video Prediction** is the task of predicting future frames given past video frames. + + +Source: [Photo-Realistic Video Prediction on Natural Videos of Largely Changing Frames ](https://arxiv.org/abs/2003.08635)" +computer-vision,content based image retrieval,"**Content-Based Image Retrieval** is a well studied problem in computer vision, with retrieval problems generally divided into two groups: category-level retrieval and instance-level retrieval. Given a query image of the Sydney Harbour bridge, for instance, category-level retrieval aims to find any bridge in a given dataset of images, whilst instance-level retrieval must find the Sydney Harbour bridge to be considered a match. + + +Source: [Camera Obscurer: Generative Art for Design Inspiration ](https://arxiv.org/abs/1903.02165)" +computer-vision,multiple object tracking with transformer, +computer-vision,imputation,Substituting missing data with values according to some criteria. +computer-vision,symmetry detection, +computer-vision,video visual relation detection,"**Video Visual Relation Detection (VidVRD)** aims to detect instances of visual relations of interest in a video, where a visual relation instance is represented by a relation triplet with the trajectories of the subject and object. As compared to still images, videos provide a more natural set of features for detecting visual relations, such as the dynamic relations like “A-follow-B” and “A-towards-B”, and temporally changing relations like “A-chase-B” followed by “A-hold-B”. Yet, VidVRD is technically more challenging than ImgVRD due to the difficulties in accurate object tracking and diverse relation appearances in the video domain. + +Source: [ImageNet-VidVRD Video Visual Relation Dataset](https://xdshang.github.io/docs/imagenet-vidvrd.html)" +computer-vision,unsupervised point cloud linear evaluation,Training a linear classifier(e.g. SVM) on the representations learned in an unsupervised manner on the pretrained(e.g. ShapeNet) dataset. +computer-vision,video deinterlacing, +computer-vision,single image blind deblurring, +computer-vision,scene reconstruction,Creating 3D scene either using conventional SFM pipelines or latest deep learning approaches. +computer-vision,point cloud registration,"**Point Cloud Registration** is a fundamental problem in 3D computer vision and photogrammetry. Given several sets of points in different coordinate systems, the aim of registration is to find the transformation that best aligns all of them into a common coordinate system. Point Cloud Registration plays a significant role in many vision applications such as 3D model reconstruction, cultural heritage management, landslide monitoring and solar energy analysis. + + +Source: [Iterative Global Similarity Points : A robust coarse-to-fine integration solution for pairwise 3D point cloud registration ](https://arxiv.org/abs/1808.03899)" +computer-vision,video reconstruction,"Source: [Deep-SloMo](https://github.com/avinashpaliwal/Deep-SloMo)" +computer-vision,blind face restoration,"Blind face restoration aims at recovering high-quality faces from the low-quality counterparts suffering from unknown degradation, such as low-resolution, noise, blur, compression artifacts, etc. When applied to real-world scenarios, it becomes more challenging, due to more complicated degradation, diverse poses and expressions. + + +Description source: [Towards Real-World Blind Face Restoration with Generative Facial Prior](https://paperswithcode.com/paper/towards-real-world-blind-face-restoration) + +Image source: [Towards Real-World Blind Face Restoration with Generative Facial Prior](https://paperswithcode.com/paper/towards-real-world-blind-face-restoration)" +computer-vision,ifc entity classification, +computer-vision,multimodal patch matching,"Multimodal patch matching focuses on matching patches originating from different sources, such as visible RGB and near-infrared." +computer-vision,monocular cross view road scene parsing, +computer-vision,object localization,"**Object Localization** is the task of locating an instance of a particular object category in an image, typically by specifying a tightly cropped bounding box centered on the instance. An object proposal specifies a candidate bounding box, and an object proposal is said to be a correct localization if it sufficiently overlaps a human-labeled “ground-truth” bounding box for the given object. In the literature, the “Object Localization” task is to locate one instance of an object category, whereas “object detection” focuses on locating all instances of a category in a given image. + + +Source: [Fast On-Line Kernel Density Estimation for Active Object Localization ](https://arxiv.org/abs/1611.05369)" +computer-vision,single image dehazing, +computer-vision,point cloud classification dataset, +computer-vision,jpeg compression artifact reduction, +computer-vision,depth estimation,Image: [monodepth2](https://github.com/nianticlabs/monodepth2) +computer-vision,historical color image dating, +computer-vision,scene flow estimation,"**Scene Flow Estimation** is the task of obtaining 3D structure and 3D motion of dynamic scenes, which is crucial to environment perception, e.g., in the context of autonomous navigation. + + +Source: [Self-Supervised Monocular Scene Flow Estimation ](https://arxiv.org/abs/2004.04143)" +computer-vision,multi oriented scene text detection, +computer-vision,referring expression generation,Generate referring expressions +computer-vision,medical image retrieval, +computer-vision,unity, +computer-vision,online multi object tracking,"The goal of **Online Multi-Object Tracking** is to estimate the spatio-temporal trajectories of multiple objects in an online video stream (i.e., the video is provided frame-by-frame), which is a fundamental problem for numerous real-time applications, such as video surveillance, autonomous driving, and robot navigation. + + +Source: [A Hybrid Data Association Framework for Robust Online Multi-Object Tracking ](https://arxiv.org/abs/1703.10764)" +computer-vision,contour detection,"Object **Contour Detection** extracts information about the object shape in images. + + +Source: [Object Contour and Edge Detection with RefineContourNet ](https://arxiv.org/abs/1904.13353)" +computer-vision,physiological computing, +computer-vision,object detection from stereo images,"Estimating oriented 3D bounding boxes from Stereo Cameras only. + +Image: [You et al](https://openreview.net/pdf?id=BJedHRVtPB)" +computer-vision,feature matching,Image: [Choy et al](https://paperswithcode.com/paper/fully-convolutional-geometric-features) +computer-vision,facial emotion recognition,Emotion Recognition from facial images +computer-vision,micro expression spotting,"Facial Micro-Expression Spotting is a challenging task in identifying onset, apex and/or offset over a short or long micro-expression sequence." +computer-vision,drawing pictures, +computer-vision,scene graph generation,"A scene graph is a structured representation of an image, where nodes in a scene graph correspond to object bounding boxes with their object categories, and edges correspond to their pairwise relationships between objects. The task of **Scene Graph Generation** is to generate a visually-grounded scene graph that most accurately correlates with an image. + + +Source: [Scene Graph Generation by Iterative Message Passing ](https://arxiv.org/abs/1701.02426)" +computer-vision,video boundary captioning,"Provided with the timestamp of a boundary inside a video, the machine is required to generate sentences describing the status change at the boundary." +computer-vision,fine grained image inpainting, +computer-vision,hyperspectral image classification,"Hyperspectral image classification is the task of classifying a class label to every pixel in an image that was captured using (hyper)spectral sensors. + +( Image credit: [Shorten Spatial-spectral RNN with Parallel-GRU for Hyperspectral Image Classification](https://arxiv.org/pdf/1810.12563v1.pdf) )" +computer-vision,aerial video saliency prediction, +computer-vision,im2spec,Predicting spectra from images (and vice versa) +computer-vision,handwritten document recognition, +computer-vision,real time multi object tracking,Online and Real-time Multi-Object Tracking would achieve the real-time speed over 30 frames per second with online approach. +computer-vision,reconstruction,Image: [Gwak et al](https://arxiv.org/pdf/1705.10904v2.pdf) +computer-vision,lip sync,"Given a video of an arbitrary person, and an arbitrary driving speech, the task is to generate a lip-synced video that matches the given speech. + +This task requires the approach to not be constrained by identity, voice, or language." +computer-vision,user constrained thumbnail generation,"Thumbnail generation is the task of generating image thumbnails from an input image. + +( Image credit: [User Constrained Thumbnail Generation using Adaptive Convolutions](https://arxiv.org/pdf/1810.13054v3.pdf) )" +computer-vision,weakly supervised panoptic segmentation, +computer-vision,irregular text recognition,"To read a text from an image might be difficult due to the improper angle of the text inside the image or due to surprising font. Hence, to recognize the text data from the image, Irregular Text Recognition is used." +computer-vision,unsupervised video object segmentation,The unsupervised scenario assumes that the user does not interact with the algorithm to obtain the segmentation masks. Methods should provide a set of object candidates with no overlapping pixels that span through the whole video sequence. This set of objects should contain at least the objects that capture human attention when watching the whole video sequence i.e objects that are more likely to be followed by human gaze. +computer-vision,story continuation,"The task involves providing an initial scene that can be obtained in real world use cases. By including this scene, a model can then copy and adapt elements from it as it generates subsequent images. + +Source: [StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation](https://paperswithcode.com/paper/storydall-e-adapting-pretrained-text-to-image)" +computer-vision,coos 7 accuracy,"COOS-7 contains 132,209 single-cell images of mouse cells, where the task is to predict protein subcellular localization. Images are spread over 1 training set and 4 testing sets, where each single-cell image contains a protein and nucleus fluorescent channels. COOS-7 provides a classification setting where four test datasets have increasing degrees of covariate shift: some images are random subsets of the training data, while others are from experiments reproduced months later and imaged by different instruments. While most classifiers perform well on test datasets similar to the training dataset, all classifiers failed to generalize their performance to datasets with greater covariate shifts. Read more at https://www.alexluresearch.com/publication/coos/." +computer-vision,motion prediction, +computer-vision,animation, +computer-vision,action triplet recognition,"Recognising action as a triplet of subject verb and object. Example HOI = Human Object Interaction, Surgical IVT = Instrument Verb Target, etc." +computer-vision,object recognition,"3D object recognition is the task of recognising objects from 3D data. + +Note that there are related tasks you can look at, such as [3D Object Detection](https://paperswithcode.com/task/3d-object-detection) which have more leaderboards. + +(Image credit: [Look Further to Recognize Better](https://arxiv.org/pdf/1907.12924v1.pdf))" +computer-vision,stereo lidar fusion,Depth estimation using stereo cameras and a LiDAR sensor. +computer-vision,one shot object detection,"( Image credit: [Siamese Mask R-CNN +](https://github.com/bethgelab/siamese-mask-rcnn) )" +computer-vision,furniture segmentation, +computer-vision,wildly unsupervised domain adaptation,Transferring knowledge from a noisy source domain to unlabeled target domain. +computer-vision,open vocabulary object detection,"Open-vocabulary detection (OVD) aims to generalize beyond the limited number of base classes labeled during the training phase. The goal is to detect novel classes defined by an unbounded +(open) vocabulary at inference." +computer-vision,change detection,"Image credit: [""A TRANSFORMER-BASED SIAMESE NETWORK FOR CHANGE DETECTION""](https://arxiv.org/pdf/2201.01293v1.pdf)" +computer-vision,scene labeling, +computer-vision,referring image matting keyword based,"Keyword-based referring image matting, taking an image and a keyword word as the input." +computer-vision,few shot action recognition,"Few-shot (FS) action recognition is a challenging com- +puter vision problem, where the task is to classify an unlabelled query video into one of the action categories in the support set having limited samples per action class." +computer-vision,semi supervised instance segmentation, +computer-vision,jpeg artifact removal, +computer-vision,temporal localization, +computer-vision,handwritten chinese text recognition,"Handwritten Chinese text recognition is the task of interpreting handwritten Chinese input, e.g., from images of documents or scans." +computer-vision,multi person pose estimation,"Multi-person pose estimation is the task of estimating the pose of multiple people in one frame. + +( Image credit: [Human Pose Estimation with TensorFlow +](https://github.com/eldar/pose-tensorflow) )" +computer-vision,partial video copy detection,The PVCD goal is identifying and locating if one or more segments of a long testing video have been copied (transformed) from the reference videos dataset. +computer-vision,image matching,"Image Matching or wide multiple baseline stereo (WxBS) is a process of establishing a sufficient number of pixel or region correspondences from two or more images depicting the same scene to estimate the geometric relationship between cameras, which produced these images. + +Source: [The Role of Wide Baseline Stereo in the Deep Learning World](https://ducha-aiki.github.io/wide-baseline-stereo-blog/2020/03/27/intro.html) + +( Image credit: [Kornia](https://github.com/kornia/kornia) )" +computer-vision,video understanding,"A crucial task of **Video Understanding** is to recognise and localise (in space and time) different actions or events appearing in the video. + + +Source: [Action Detection from a Robot-Car Perspective ](https://arxiv.org/abs/1807.11332)" +computer-vision,hand pose estimation,Image: [Zimmerman et l](https://arxiv.xsrg/pdf/1705.01389v3.pdf) +computer-vision,aware image synthesis, +computer-vision,age estimation,"Age Estimation is the task of estimating the age of a person from an image some other kind of data. + +( Image credit: [BridgeNet](https://arxiv.org/pdf/1904.03358v1.pdf) )" +computer-vision,skeleton based action recognition,"( Image credit: [View Adaptive Neural Networks for High +Performance Skeleton-based Human Action +Recognition](https://arxiv.org/pdf/1804.07453v3.pdf) )" +computer-vision,small object detection,"Small object detection is the task of detecting small objects. + +( Image credit: [Feature-Fused SSD](https://arxiv.org/pdf/1709.05054v3.pdf) )" +computer-vision,thoracic disease classification, +computer-vision,hand, +computer-vision,action quality assessment,Assessing/analyzing/quantifying how well an action was performed. +computer-vision,sports analytics, +computer-vision,pose estimation,"Image credit: [GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision +, ECCV'20](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600511.pdf)" +computer-vision,cross domain few shot, +computer-vision,face reconstruction,"Face reconstruction is the task of recovering the facial geometry of a face from an image. + +( Image credit: Microsoft [Deep3DFaceReconstruction](https://github.com/Microsoft/Deep3DFaceReconstruction) )" +computer-vision,vision language navigation,"Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments. + +( Image credit: [Learning to Navigate Unseen Environments: +Back Translation with Environmental Dropout](https://arxiv.org/pdf/1904.04195v1.pdf) )" +computer-vision,active object detection,Active Learning for Object Detection +computer-vision,supervised dimensionality reduction, +computer-vision,sketch recognition, +computer-vision,facial landmark localization,Image: [Zhang et al](https://arxiv.org/pdf/1801.09242v1.pdf) +computer-vision,deblurring,"( Image credit: [Deblurring Face Images using Uncertainty Guided Multi-Stream Semantic Networks](https://arxiv.org/pdf/1907.13106v1.pdf) )" +computer-vision,dense video captioning,"Most natural videos contain numerous events. For example, in a video of a “man playing a piano”, the video might also contain “another man dancing” or “a crowd clapping”. The task of dense video captioning involves both detecting and describing events in a video." +computer-vision,image restoration,"**Image Restoration** is a family of inverse problems for obtaining a high quality image from a corrupted input image. Corruption may occur due to the image-capture process (e.g., noise, lens blur), post-processing (e.g., JPEG compression), or photography in non-ideal conditions (e.g., haze, motion blur). + + +Source: [Blind Image Restoration without Prior Knowledge ](https://arxiv.org/abs/2003.01764)" +computer-vision,multi frame super resolution,"When multiple images of the same view are taken from slightly different positions, perhaps also at different times, then they collectively contain more information than any single image on its own. Multi-Frame Super-Resolution fuses these low-res inputs into a composite high-res image that can reveal some of the original detail that cannot be recovered from any low-res image alone. + +( Credit: [HighRes-net](https://github.com/ElementAI/HighRes-net) )" +computer-vision,image quality assessment,"paper:Blind image quality assessment by visual neuron matrix +code:https://github.com/Xiaodong-Bi/VNM" +computer-vision,deception detection in videos, +computer-vision,anomaly detection in surveillance videos, +computer-vision,unet segmentation,"U-Net is an architecture for semantic segmentation. It consists of a contracting path (Up to down) and an expanding path (Down to up). During the contraction, the spatial information is reduced while feature information is increased. +The contracting path follows the typical architecture of a convolutional network. It consists of the repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for downsampling. At each downsampling step, we double the number of feature channels. Every step in the expansive path consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution”) that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3x3 convolutions, each followed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution. At the final layer, a 1x1 convolution is used to map each 64-component feature vector to the desired number of classes. In total the network has 23 convolutional layers." +computer-vision,image similarity detection,"A fundamental computer vision task to determine whether a part of an image has been copied from another image. + +Description from: [The 2021 Image Similarity Dataset and Challenge](https://paperswithcode.com/paper/the-2021-image-similarity-dataset-and) + +Image credit: [The 2021 Image Similarity Dataset and Challenge](https://paperswithcode.com/paper/the-2021-image-similarity-dataset-and)" +computer-vision,video instance segmentation,"The goal of video instance segmentation is simultaneous detection, segmentation and tracking of instances in videos. In words, it is the first time that the image instance segmentation problem is extended to the video domain. + +To facilitate research on this new task, a large-scale benchmark called YouTube-VIS, which consists of 2,883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks is built." +computer-vision,human activity recognition,Classify various human activities +computer-vision,novel view synthesis,"Synthesize a target image with an arbitrary target camera pose from given source images and their camera poses. + +( Image credit: [Multi-view to Novel view: Synthesizing novel views with Self-Learned Confidence](https://github.com/shaohua0116/Multiview2Novelview) )" +computer-vision,shape reconstruction,"Image credit: [GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision +, ECCV'20](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600511.pdf)" +computer-vision,cube engraving classification, +computer-vision,deepfake detection,"DeepFakes involves videos, often obscene, in which a face can be swapped with someone else’s using neural networks. DeepFakes are a general public concern, thus it's important to develop methods to detect them. + +Description source: [DeepFakes: a New Threat to Face Recognition? Assessment and Detection](https://arxiv.org/pdf/1812.08685.pdf) + +Image source: [DeepFakes: a New Threat to Face Recognition? Assessment and Detection](https://paperswithcode.com/paper/deepfakes-a-new-threat-to-face-recognition)" +computer-vision,situation recognition,"Situation Recognition aims to produce the structured image summary which describes the primary activity (verb), and its relevant entities (nouns)." +computer-vision,shape reconstruction from a single,Image: [Liao et al](https://arxiv.org/pdf/1811.12016v1.pdf) +computer-vision,cloud removal, +computer-vision,camera shot segmentation,"Camera shot temporal segmentation consists in classifying each video frame according to the type of camera used to record said frame. This task is introduced with the SoccerNet-v2 dataset, where 13 camera classes are considered (main camera, behind the goal, corner camera, etc.)." +computer-vision,weakly supervised human pose estimation,This task targets at 3D Human Pose Estimation with fewer 3D annotation. +computer-vision,video object tracking,Video Object Detection aims to detect targets in videos using both spatial and temporal information. It's usually deeply integrated with tasks such as Object Detection and Object Tracking. +computer-vision,image to video person re identification, +computer-vision,unseen object instance segmentation,"Instance segmentation is the task of detecting and delineating each distinct object of interest appearing in an image. + +Image Credit: [Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers](https://arxiv.org/abs/2103.12340)" +computer-vision,large scale person re identification, +computer-vision,surface normals estimation,Surface normal estimation deals with the task of predicting the surface orientation of the objects present inside a scene. Refer to [Designing Deep Networks for Surface Normal Estimation (Wang et al.)](https://www.cs.cmu.edu/~xiaolonw/papers/deep3d.pdf) to get a good overview of several design choices that led to the development of a CNN-based surface normal estimator. +computer-vision,color constancy,"**Color Constancy** is the ability of the human vision system to perceive the colors of the objects in the scene largely invariant to the color of the light source. The task of computational Color Constancy is to estimate the scene illumination and then perform the chromatic adaptation in order to remove the influence of the illumination color on the colors of the objects in the scene. + + +Source: [CroP: Color Constancy Benchmark Dataset Generator ](https://arxiv.org/abs/1903.12581)" +computer-vision,partially relevant video retrieval,"In the Partially Relevant Video Retrieval (PRVR) task, an untrimmed video is considered to be partially relevant w.r.t. a given textual query if it contains a moment relevant to the query. PRVR aims to retrieve such partially relevant videos from a large collection of untrimmed videos." +computer-vision,age invariant face recognition,"Age-invariant face recognition is the task of performing face recognition that is invariant to differences in age. + +( Image credit: [Look Across Elapse](https://arxiv.org/pdf/1809.00338v2.pdf) )" +computer-vision,material classification, +computer-vision,multi person pose estimation absolute,"This task aims to solve absolute 3D multi-person pose Estimation (camera-centric coordinates). No ground truth human bounding box and human root joint coordinates are used during testing stage. + +( Image credit: [RootNet](https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE) )" +computer-vision,transparent objects, +computer-vision,amodal instance segmentation,"Different from traditional segmentation which only focuses on visible regions, amodal instance segmentation also predicts the occluded parts of object instances. + +Description Credit: [Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers, CVPR'21](https://openaccess.thecvf.com/content/CVPR2021/papers/Ke_Deep_Occlusion-Aware_Instance_Segmentation_With_Overlapping_BiLayers_CVPR_2021_paper.pdf)" +computer-vision,multi modal image segmentation, +computer-vision,reverse style transfer, +computer-vision,image deblocking, +computer-vision,semantic image matting, +computer-vision,video interlacing, +computer-vision,face verification,"Face verification is the task of comparing a candidate face to another, and verifying whether it is a match. It is a one-to-one mapping: you have to check if this person is the correct one. + +( Image credit: [Pose-Robust Face Recognition via Deep Residual Equivariant Mapping](https://arxiv.org/pdf/1803.00839v1.pdf) )" +computer-vision,image classification shift consistency,"How often two shifts of the same image are classified the same + +( Image credit: [Antialiased CNNs](https://github.com/adobe/antialiased-cnns) )" +computer-vision,weakly supervised action recognition,Action recognition with single-point annotations in time (there are no action start/stop time annotations) +computer-vision,rgb d salient object detection,"RGB-D Salient object detection (SOD) aims at distinguishing the most visually distinctive objects or regions in a scene from the given RGB and Depth data. It has a wide range of applications, including video/image segmentation, object recognition, visual tracking, foreground maps evaluation, image retrieval, content-aware image editing, information discovery, photosynthesis, and weakly +supervised semantic segmentation. Here, depth information plays an important complementary role in finding salient objects. Online benchmark: http://dpfan.net/d3netbenchmark. + + +( Image credit: [Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks, TNNLS20](https://ieeexplore.ieee.org/abstract/document/9107477) )" +computer-vision,spectral reconstruction, +computer-vision,text line extraction, +computer-vision,weakly supervised segmentation, +computer-vision,object categorization,"Object categorization identifies which label, from a +given set, best corresponds to an image region defined by +an input image and bounding box." +computer-vision,handwriting generation,The inverse of handwriting recognition. From text generate and image of handwriting (offline) of trajectory of handwriting (online). +computer-vision,medical image detection, +computer-vision,pso convnets dynamics 1,Incorporating distilled Cucker-Smale elements into PSO algorithm using KNN and intertwine training with SGD +computer-vision,wireframe parsing,Detect Line Segments and their connecting Junctions in a single perspective image. +computer-vision,embodied question answering, +computer-vision,wavelet structure similarity loss, +computer-vision,action assessment, +computer-vision,video description,"The goal of automatic **Video Description** is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage. + + +Source: [Joint Event Detection and Description in Continuous Video Streams ](https://arxiv.org/abs/1802.10250)" +computer-vision,point cloud super resolution,"Point cloud super-resolution is a fundamental problem +for 3D reconstruction and 3D data understanding. It takes +a low-resolution (LR) point cloud as input and generates +a high-resolution (HR) point cloud with rich details" +computer-vision,dial meter reading, +computer-vision,co saliency detection,"**Co-Salient Object Detection** is a computational problem that aims at highlighting the common and salient foreground regions (or objects) in an image group. Please also refer to the online benchmark: http://dpfan.net/cosod3k/ + + + + +( Image credit: [Taking a Deeper Look at Co-Salient Object Detection, CVPR2020](https://openaccess.thecvf.com/content_CVPR_2020/papers/Fan_Taking_a_Deeper_Look_at_Co-Salient_Object_Detection_CVPR_2020_paper.pdf) )" +computer-vision,sequential image classification,"Sequential image classification is the task of classifying a sequence of images. + +( Image credit: [TensorFlow-101](https://github.com/sjchoi86/tensorflow-101/blob/master/notebooks/rnn_mnist_simple.ipynb) )" +computer-vision,dense captioning,"Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level 3D scene understanding. Apart from coarse semantic class prediction and bounding box regression as in traditional 3D object detection, 3D dense captioning aims at producing a further and finer instance-level label of natural language description on visual appearance and spatial relations for each scene object of interest." +computer-vision,multi label zero shot learning, +computer-vision,infrared and visible image fusion,Image fusion with paired infrared and visible images +computer-vision,facial attribute classification,"Facial attribute classification is the task of classifying various attributes of a facial image - e.g. whether someone has a beard, is wearing a hat, and so on. + +( Image credit: [Multi-task Learning of Cascaded CNN for Facial Attribute Classification +](https://arxiv.org/pdf/1805.01290v1.pdf) )" +computer-vision,boundary detection,"**Boundary Detection** is a vital part of extracting information encoded in images, allowing for the computation of quantities of interest including density, velocity, pressure, etc. + + +Source: [A Locally Adapting Technique for Boundary Detection using Image Segmentation ](https://arxiv.org/abs/1707.09030)" +computer-vision,superpixel image classification,A **Superpixel Image classification** can be classified the group of pixels that share common characteristics (like pixel intensity ) or segementize the common pixel value in to one group. +computer-vision,trademark retrieval, +computer-vision,ensemble learning, +computer-vision,facial recognition and modelling,Facial tasks in machine learning operate based on images or video frames (or other datasets) focussed on human faces. +computer-vision,brdf estimation, +computer-vision,multi object colocalization, +computer-vision,real to cartoon translation,Cartoonifying images +computer-vision,unsupervised person re identification, +computer-vision,video semantic segmentation, +computer-vision,face identification,Face identification is the task of matching a given face image to one in an existing database of faces. It is the second part of face recognition (the first part being detection). It is a one-to-many mapping: you have to find an unknown person in a database to find who that person is. +computer-vision,light source estimation, +computer-vision,action segmentation,"**Action Segmentation** is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization. + + +Source: [TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation ](https://arxiv.org/abs/1705.07818)" +computer-vision,multi human parsing,"Multi-human parsing is the task of parsing multiple humans in crowded scenes. + +( Image credit: [Multi-Human Parsing](https://github.com/ZhaoJ9014/Multi-Human-Parsing) )" +computer-vision,action recognition,Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation. +computer-vision,group activity recognition,"**Group Activity Recognition** is a subset of human activity recognition problem which focuses on the collective behavior of a group of people, resulted from the individual actions of the persons and their interactions. Collective activity recognition is a basic task for automatic human behavior analysis in many areas like surveillance or sports videos. + + +Source: [A Multi-Stream Convolutional Neural Network Framework for Group Activity Recognition ](https://arxiv.org/abs/1812.10328)" +computer-vision,speaker specific lip to speech synthesis,"How accurately can we infer an individual’s speech style and content from his/her lip movements? [1] + +In this task, the model is trained on a specific speaker, or a very limited set of speakers. + +[1] Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis, CVPR 2020." +computer-vision,multimodal unsupervised image to image,"Multimodal unsupervised image-to-image translation is the task of producing multiple translations to one domain from a single image in another domain. + +( Image credit: [MUNIT: Multimodal UNsupervised Image-to-image Translation](https://github.com/NVlabs/MUNIT) )" +computer-vision,hurricane forecasting,"Tropical Cyclone Forecasting using Computer Vision, Deep Learning, and Time-Series methods" +computer-vision,zero shot segmentation, +computer-vision,webpage object detection,Detect Web Element for various classes from candidate web elements obtained from DOM tree (No need for Bounding Box Regression) +computer-vision,physical attribute prediction, +computer-vision,rf based gesture recognition,"RF-based gesture sensing and recognition has increasingly attracted intense academic and industrial interest due to its various device-free applications in daily life, such as elder monitoring, mobile games. State-of-the-art approaches achieved accurate gesture sensing by using fine-grained RF signatures (such as CSI, Doppler effect) while could not achieve the same accuracy with coarse-grained RF signatures such as received signal strength (RSS). + +See e.g. + +Project Soli in depth: How radar-detected gestures could set the Pixel 4 apart +An experimental Google project may finally be ready to make its way into the real world — and the implications could be enormous. https://www.computerworld.com/article/3402019/google-project-soli-pixel-4.html + +( Image credit: [Accurate Human Gesture Sensing With +Coarse-Grained RF Signatures](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8737967) )" +computer-vision,image categorization, +computer-vision,pedestrian detection,"Pedestrian detection is the task of detecting pedestrians from a camera. + +Further state-of-the-art results (e.g. on the KITTI dataset) can be found at [3D Object Detection](https://paperswithcode.com/task/object-detection). + +( Image credit: [High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection](https://github.com/liuwei16/CSP) )" +computer-vision,part level panoptic segmentation,Panoptic segmentation with part-aware predictions. +computer-vision,weakly supervised action segmentation action,Learning an action segmentation model while the only available supervision is action set -- the set of actions happened in the video without information about their temporal locations. +computer-vision,micro expression recognition,"Facial Micro-Expression Recognition is a challenging task in identifying suppressed emotion in a high-stake environment, often comes in very brief duration and subtle changes." +computer-vision,transparent object detection,Detecting transparent objects in 2D or 3D +computer-vision,room layouts from a single rgb panorama,Image: [Zou et al](https://arxiv.org/pdf/1803.08999v1.pdf) +computer-vision,intensity image denoising, +computer-vision,cross domain few shot learning,Its essence is transfer learning. The model needs to be trained in the source domain and then migrated to the target domain. Compliant with (1) the category in the target domain has never appeared in the source domain (2) the data distribution of the target domain is inconsistent with the source domain (3) each class in the target domain has very few labels +computer-vision,car pose estimation, +computer-vision,conformal prediction, +computer-vision,rotation estimation, +computer-vision,object tracking,"**Object tracking** is the task of taking an initial set of object detections, creating a unique ID for each of the initial detections, and then tracking each of the objects as they move around frames in a video, maintaining the ID assignment. State-of-the-art methods involve fusing data from RGB and event-based cameras to produce more reliable object tracking. CNN-based models using only RGB images as input are also effective. The most popular benchmark is OTB. There are several evaluation metrics specific to object tracking, including HOTA, MOTA, IDF1, and Track-mAP. + +( Image credit: [Towards-Realtime-MOT +](https://github.com/Zhongdao/Towards-Realtime-MOT) )" +computer-vision,scene text detection,"**Scene Text Detection** is a task to detect text regions in the complex background and label them with bounding boxes. + + +Source: [ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection ](https://arxiv.org/abs/2004.04940)" +computer-vision,semi supervised human pose estimation,Semi-supervised human pose estimation aims to leverage the unlabelled data along with labeled data to improve the model performance. +computer-vision,pedestrian attribute recognition,"Pedestrian attribution recognition is the task of recognising pedestrian features - such as whether they are talking on a phone, whether they have a backpack, and so on. + +( Image credit: [HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis](https://arxiv.org/pdf/1709.09930v1.pdf) )" +computer-vision,sample probing, +computer-vision,future hand prediction, +computer-vision,audio visual video captioning, +computer-vision,on the fly sketch based image retrieval,Start retrieving as the user starts drawing. +computer-vision,dehazing, +computer-vision,scanpath prediction,Learning to Predict Sequences of Human Fixations. +computer-vision,single shot hdr reconstruction,"SVE-based HDR imaging, also known as single-shot HDR imaging, algorithms capture a scene with pixel-wise varying exposures in a single image and then computationally synthesize an HDR image, which benefits from the multiple exposures of the single image." +computer-vision,image denoising,"Image Denoising is the task of removing noise from an image, e.g. the application of Gaussian noise to an image. + +( Image credit: [Wide Inference Network for Image Denoising via +Learning Pixel-distribution Prior](https://arxiv.org/pdf/1707.05414v5.pdf) )" +computer-vision,self supervised person re identification,"Currently, self-supervised representation learning is mainly tested on image classification tasks, which is not insufficient to verify its effectiveness. It should also be tested in the visual matching task, and pedestrian re-recognition is just such an appropriate task." +computer-vision,canonicalization,3D Canonicalization is the process of estimating a transformation-invariant feature for classification and part segmentation tasks. +computer-vision,mixed reality, +computer-vision,image clustering,"Models that partition the dataset into semantically meaningful clusters without having access to the ground truth labels. + + Image credit: ImageNet clustering results of [SCAN: Learning to Classify Images without Labels (ECCV 2020)](https://arxiv.org/abs/2005.12320) " +computer-vision,scene text editing, +computer-vision,mobile periocular recognition,"Periocular recognition is the task of recognising a person based on their eyes (periocular). + +( Image credit: [Heterogeneity Aware Deep Embedding for Mobile Periocular Recognition](https://arxiv.org/pdf/1811.00846v1.pdf) )" +computer-vision,emotion recognition,"**Emotion Recognition** is an important area of research to enable effective human-computer interaction. Human emotions can be detected using speech signal, facial expressions, body language, and electroencephalography (EEG). Source: [Using Deep Autoencoders for Facial Expression Recognition ](https://arxiv.org/abs/1801.08329)" +computer-vision,roi based image generation, +computer-vision,event data classification, +computer-vision,plan2scene,Converting floorplans + RGB photos to textured 3D mesh models of houses. +computer-vision,weakly supervised point cloud segmentation, +computer-vision,image classification,"**Image Classification** is a fundamental task that attempts to comprehend an entire image as a whole. The goal is to classify the image by assigning it to a specific label. Typically, Image Classification refers to images in which only one object appears and is analyzed. In contrast, object detection involves both classification and localization tasks, and is used to analyze more realistic cases in which multiple objects may exist in an image. + + +Source: [Metamorphic Testing for Object Detection Systems ](https://arxiv.org/abs/1912.12162)" +computer-vision,removing text from natural images, +computer-vision,generalized zero shot learning unseen,"The average of the normalized top-1 prediction scores of unseen classes in the generalized zero-shot learning setting, where the label of a test sample is predicted among all (seen + unseen) classes." +computer-vision,multi person pose estimation,"This task aims to solve root-relative 3D multi-person pose estimation. No human bounding box and root joint coordinate groundtruth are used in testing time. + +( Image credit: [RootNet](https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE) )" +computer-vision,laminar turbulent flow localisation,It is a segmentation task on thermographic measurement images in order to separate laminar and turbulent flow regions on flight body parts. +computer-vision,color image denoising, +computer-vision,detect forged images and videos, +computer-vision,pose estimation,"6D pose estimation is the task of detecting the 6D pose of an object, which include its location and orientation. This is an important task in robotics, where a robotic arm needs to know the location and orientation to detect and move objects in its vicinity successfully. This allows the robot to operate safely and effectively alongside humans. The awareness of the position and orientation of objects in a scene is sometimes referred to as 6D, where the D stands for degrees of freedom pose. + +( Image credit: [Segmentation-driven 6D Object Pose Estimation](https://github.com/cvlab-epfl/segmentation-driven-pose) )" +computer-vision,video inpainting,"The goal of **Video Inpainting** is to fill in missing regions of a given video sequence with contents that are both spatially and temporally coherent. Video Inpainting, also known as video completion, has many real-world applications such as undesired object removal and video restoration. + + +Source: [Deep Flow-Guided Video Inpainting ](https://arxiv.org/abs/1905.02884)" +computer-vision,knowledge distillation,"Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized." +computer-vision,low light image enhancement, +computer-vision,activeness detection,Determining activeness via images +computer-vision,satellite image super resolution, +computer-vision,weakly supervised instance segmentation, +computer-vision,multiple object tracking,"**Multiple Object Tracking** is the problem of automatically identifying multiple objects in a video and representing them as a set of trajectories with high accuracy. + + +Source: [SOT for MOT ](https://arxiv.org/abs/1712.01059)" +computer-vision,fashion compatibility learning, +computer-vision,segmentation based workflow recognition, +computer-vision,gpr,Gaussian Process Regression +computer-vision,few shot image classification,"Few-shot image classification is the task of doing image classification with only a few examples for each category (typically < 6 examples). + +( Image credit: [Learning Embedding Adaptation for Few-Shot Learning](https://github.com/Sha-Lab/FEAT) )" +computer-vision,uncropping, +computer-vision,thermal infrared object tracking, +computer-vision,computer vision transduction,Transductive learning in computer vision tasks +computer-vision,image based automatic meter reading, +computer-vision,unsupervised face recognition, +computer-vision,space time video super resolution, +computer-vision,animated gif generation, +computer-vision,traffic sign detection, +computer-vision,referring image matting refmatte rw100,"Expression-based referring image matting on natural images and manually labelled annotations, i.e., RefMatte-RW100, taking the image and a flowery expression as the input." +computer-vision,ms ssim,A MS-SSIM score helps to analyze how much a De-warping module has been able to de-warp a document image from its initial distorted view. +computer-vision,texture image retrieval,It include two tasks: (1) Image as Query and Text as Targets; (2) Text as Query and Image as Targets. +computer-vision,semantic scene completion,"This task was introduced in ""Semantic Scene Completion from a Single Depth Image"" (https://arxiv.org/abs/1611.08974) at CVPR 2017 . The target is to infer the dense 3D voxelized semantic scene from an incompleted 3D input (e.g. point cloud, depth map) and an optional RGB image. A recent summary can be found in the paper ""3D Semantic Scene Completion: a Survey"" (https://arxiv.org/abs/2103.07466), published at IJCV 2021." +computer-vision,aesthetics quality assessment,Automatic assessment of aesthetic-related subjective ratings. +computer-vision,de aliasing,De-aliasing is the problem of recovering the original high-frequency information that has been aliased during the acquisition of an image. +computer-vision,scene parsing,"Scene parsing is to segment and parse an image into different image regions associated with semantic categories, such as sky, road, person, and bed. [MIT Description](http://sceneparsing.csail.mit.edu/#:~:text=Scene%20parsing%20is%20to%20segment,the%20algorithms%20of%20scene%20parsing.)" +computer-vision,online action detection,Online action detection is the task of predicting the action as soon as it happens in a streaming video without access to video frames in the future. +computer-vision,pose guided image generation,"Pose-guided image generation is the task of generating a new image of a person with guidance from pose information that the new image should synthesise around. + +( Image credit: [Coordinate-based Texture Inpainting for Pose-Guided Human Image Generation](https://arxiv.org/pdf/1811.11459v2.pdf) )" +computer-vision,depth map super resolution,"Depth map super-resolution is the task of upsampling depth images. + +( Image credit: [A Joint Intensity and Depth Co-Sparse Analysis Model +for Depth Map Super-Resolution](https://arxiv.org/pdf/1304.5319v1.pdf) )" +computer-vision,weakly supervised temporal action,Temporal Action Localization with weak supervision where only video-level labels are given for training +computer-vision,self supervised action recognition, +computer-vision,monocular object detection,Monocular 3D Object Detection is the task to draw 3D bounding box around objects in a single 2D RGB image. It is localization task but without any extra information like depth or other sensors or multiple-images. +computer-vision,homography estimation, +computer-vision,human fmri response prediction,"The task is: Given a) the set of videos of everyday events and b) the corresponding brain responses recorded while human participants viewed those videos, use computational models to predict brain responses for videos." +computer-vision,spectral estimation from a single rgb image, +computer-vision,birds eye view object detection,KITTI birds eye view detection task +computer-vision,visual place recognition,"**Visual Place Recognition** is the task of matching a view of a place with a different view of the same place taken at a different time. + +Source: [Visual place recognition using landmark distribution descriptors ](https://arxiv.org/abs/1608.04274) + +Image credit: [Visual place recognition using landmark distribution descriptors](https://arxiv.org/pdf/1608.04274.pdf)" +computer-vision,object detection in aerial images,"Object Detection in Aerial Images is the task of detecting objects from aerial images. + +( Image credit: [DOTA: A Large-Scale Dataset for Object Detection in Aerial Images](http://openaccess.thecvf.com/content_cvpr_2018/papers/Xia_DOTA_A_Large-Scale_CVPR_2018_paper.pdf) )" +computer-vision,hyperspectral, +computer-vision,unsupervised landmark detection,"The discovery of object landmarks on a set of images depicting objects of the same category, directly from raw images without using any manual annotations." +computer-vision,forgery, +computer-vision,hd semantic map learning,"The goal of task is to generate map elements in a vectorized form using data from onboard sensors, e.g., RGB cameras and/or LiDARs. These map elements include but are not limited to : Road boundaries, boundaries of roads that split roads and sidewalks." +computer-vision,dynamic region segmentation, +computer-vision,video frame interpolation,"The goal of **Video Frame Interpolation** is to synthesize several frames in the middle of two adjacent frames of the original video. Video Frame Interpolation can be applied to generate slow motion video, increase video frame rate, and frame recovery in video streaming. + + +Source: [Reducing the X-ray radiation exposure frequency in cardio-angiography via deep-learning based video interpolation ](https://arxiv.org/abs/2006.00781)" +computer-vision,weakly supervised object detection,"Weakly Supervised Object Detection (WSOD) is the task of training object detectors with only image tag supervisions. + +( Image credit: [Soft Proposal Networks for Weakly Supervised Object Localization](https://arxiv.org/pdf/1709.01829v1.pdf) )" +computer-vision,referring image matting expression based,"Expression-based referring image matting, taking an image and a flowery expression as the input." +computer-vision,yield mapping in apple orchards, +computer-vision,pose estimation,"Pose Estimation is a general problem in Computer Vision where the goal is to detect the position and orientation of a person or an object. Usually, this is done by predicting the location of specific keypoints like hands, head, elbows, etc. in case of Human Pose Estimation. + +A common benchmark for this task is [MPII Human Pose](https://paperswithcode.com/sota/pose-estimation-on-mpii-human-pose) + +( Image credit: [Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose](https://github.com/Daniil-Osokin/lightweight-human-pose-estimation.pytorch) )" +computer-vision,visual social relationship recognition, +computer-vision,single image super resolution, +computer-vision,fairness, +computer-vision,sign language recognition,"Given a signed video input the task is to predict the (sequence of) sign(s) that are performed. + +( Image credit: [Word-level Deep Sign Language Recognition from Video: +A New Large-scale Dataset and Methods Comparison](https://arxiv.org/pdf/1910.11006v1.pdf) )" +computer-vision,out of distribution detection,Detect out-of-distribution or anomalous examples. +computer-vision,frame duplication detection, +computer-vision,parking space occupancy,Image credit: [https://github.com/martin-marek/parking-space-occupancy](https://github.com/martin-marek/parking-space-occupancy) +computer-vision,image generation,"Image generation (synthesis) is the task of generating new images from an existing dataset. + +- **Unconditional generation** refers to generating samples unconditionally from the dataset, i.e. $p(y)$ +- **[Conditional image generation](/task/conditional-image-generation)** (subtask) refers to generating samples conditionally from the dataset, based on a label, i.e. $p(y|x)$. + +In this section, you can find state-of-the-art leaderboards for **unconditional generation**. For conditional generation, and other types of image generations, refer to the subtasks. + +( Image credit: [StyleGAN](https://github.com/NVlabs/stylegan) )" +computer-vision,video harmonization,Video harmonization aims to adjust the foreground of a composite video to make it compatible with the background. +computer-vision,depth estimation,"**Depth Estimation** is the task of measuring the distance of each pixel relative to the camera. Depth is extracted from either monocular (single) or stereo (multiple views of a scene) images. Traditional methods use multi-view geometry to find the relationship between the images. Newer methods can directly estimate depth by minimizing the regression loss, or by learning to generate a novel view from a sequence. The most popular benchmarks are KITTI and NYUv2. Models are typically evaluated according to a RMS metric. + +Source: [DIODE: A Dense Indoor and Outdoor DEpth Dataset ](https://arxiv.org/abs/1908.00463)" +computer-vision,referring expression,"Referring expressions places a bounding box around +the instance corresponding to the provided description and +image." +computer-vision,steganalysis,Detect the usage of Steganography +computer-vision,cloud detection, +computer-vision,classify point clouds, +computer-vision,semantic segmentation,"Semantic segmentation, or image segmentation, is the task of clustering parts of an image together which belong to the same object class. It is a form of pixel-level prediction because each pixel in an image is classified according to a category. Some example benchmarks for this task are Cityscapes, PASCAL VOC and ADE20K. Models are usually evaluated with the Mean Intersection-Over-Union (Mean IoU) and Pixel Accuracy metrics. + +( Image credit: [CSAILVision](https://github.com/CSAILVision/semantic-segmentation-pytorch) )" +computer-vision,real time visual tracking, +computer-vision,handwritten digit recognition, +computer-vision,facial inpainting,"Facial inpainting (or face completion) is the task of generating plausible facial structures for missing pixels in a face image. + +( Image credit: [SymmFCNet](https://github.com/csxmli2016/SymmFCNet) )" +computer-vision,face model, +computer-vision,text to shape generation, +computer-vision,automatic post editing,Automatic post-editing (APE) is used to correct errors in the translation made by the machine translation systems. +computer-vision,offline surgical phase recognition,"Offline surgical phase recognition: the first 40 videos to train, the last 40 videos to test." +computer-vision,referring expression segmentation,"The task aims at labelling the pixels of an image or video that represent an object instance referred by a linguistic expression. In particular, the referring expression (RE) must allow the identification of an indivisual object in a discourse or scene (the referent). REs unambiguosly identify the target instace." +computer-vision,video recognition,I will be back on a summer rainy night +computer-vision,panoptic segmentation,"Panoptic segmentation unifies the typically distinct tasks of semantic segmentation (assign a class label to each pixel) and instance segmentation (detect and segment each object instance). + +( Image credit: [Detectron2](https://github.com/facebookresearch/detectron2) )" +computer-vision,head detection, +computer-vision,one shot segmentation,"( Image credit: [One-Shot Learning for Semantic +Segmentation](https://arxiv.org/pdf/1709.03410v1.pdf) )" +computer-vision,dense object detection, +computer-vision,atomic action recognition, +computer-vision,road segementation,Road Segmentation is a pixel wise binary classification in order to extract underlying road network. Various Heuristic and data driven models are proposed. Continuity and robustness still remains one of the major challenges in the area. +computer-vision,action recognition in still images, +computer-vision,multiview learning, +computer-vision,part segmentation,"Segmenting 3D object parts + +( Image credit: [MeshCNN: A Network with an Edge](https://arxiv.org/pdf/1809.05910v2.pdf) )" +computer-vision,edge detection,"**Edge Detection** is a fundamental image processing technique which involves computing an image gradient to quantify the magnitude and direction of edges in an image. Image gradients are used in various downstream tasks in computer vision such as line detection, feature detection, and image classification. + + +Source: [Artistic Enhancement and Style Transfer of Image Edges using Directional Pseudo-coloring ](https://arxiv.org/abs/1906.07981) + +( Image credit: [Kornia](https://github.com/kornia/kornia) )" +computer-vision,geometric matching, +computer-vision,autonomous driving,"Autonomous driving is the task of driving a vehicle without human conduction. + +Many of the state-of-the-art results can be found at more general task pages such as [3D Object Detection](https://paperswithcode.com/task/3d-object-detection) and [Semantic Segmentation](https://paperswithcode.com/task/semantic-segmentation). + +(Image credit: [Exploring the Limitations of Behavior Cloning for Autonomous Driving](https://arxiv.org/pdf/1904.08980v1.pdf))" +computer-vision,image forensics, +computer-vision,one shot action recognition, +computer-vision,semi supervised and landmark labeling, +computer-vision,severity prediction, +computer-vision,typeface completion, +computer-vision,eye tracking, +computer-vision,compressive sensing,"**Compressive Sensing** is a new signal processing framework for efficiently acquiring and reconstructing a signal that have a sparse representation in a fixed linear basis. + + +Source: [Sparse Estimation with Generalized Beta Mixture and the Horseshoe Prior ](https://arxiv.org/abs/1411.2405)" +computer-vision,multi view learning,"**Multi-View Learning** is a machine learning framework where data are represented by multiple distinct feature groups, and each feature group is referred to as a particular view. + + +Source: [Dissimilarity-based representation for radiomics applications ](https://arxiv.org/abs/1803.04460)" +computer-vision,road scene understanding, +computer-vision,depth and camera motion, +computer-vision,scale generalisation,Scale generalisation implies that learning is performed at some scale(s) and testing at other scales. +computer-vision,indoor localization,Indoor localization is a fundamental problem in indoor location-based applications. +computer-vision,action classification,Image source: [The Kinetics Human Action Video Dataset](https://arxiv.org/pdf/1705.06950.pdf) +computer-vision,human pose forecasting,"Human pose forecasting is the task of detecting and predicting future human poses. + +( Image credit: [EgoPose](https://github.com/Khrylx/EgoPose) )" +computer-vision,jpeg forgery localization, +computer-vision,human instance segmentation,"Instance segmentation is the task of detecting and delineating each distinct object of interest appearing in an image. + +Image Credit: [Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers](https://arxiv.org/abs/2103.12340)" +computer-vision,video kinematic segmentation base workflow, +computer-vision,visual sentiment prediction, +computer-vision,underwater scene reconstruction, +computer-vision,multi person pose estimation root relative,"This task aims to solve root-relative 3D multi-person pose estimation (person-centric coordinate system). No ground truth human bounding box and human root joint coordinates are used during testing stage. + +( Image credit: [RootNet](https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE) )" +computer-vision,online surgical phase recognition,"Online surgical phase recognition: the first 40 videos to train, the last 40 videos to test." +computer-vision,surgical phase recognition,"The first 40 videos are used for training, the last 40 videos are used for testing." +computer-vision,line detection, +computer-vision,foveation, +computer-vision,iris recognition, +computer-vision,class incremental learning,Incremental learning of a sequence of tasks when the task-ID is not available at test time. +computer-vision,handwritten digit image synthesis, +computer-vision,multiple object forecasting,"( Image credit: [Multiple Object Forecasting](https://github.com/olly-styles/Multiple-Object-Forecasting) )" +computer-vision,shape generation,Image: [Mo et al](https://arxiv.org/pdf/1908.00575v1.pdf) +computer-vision,stereo image super resolution, +computer-vision,human pose tracking, +computer-vision,landmark tracking, +computer-vision,inpainting,"**3D Inpainting** is the removal of unwanted objects +from a 3D scene, such that the replaced region is visually +plausible and consistent with its context." +computer-vision,event based vision,"An event camera, also known as a neuromorphic camera, silicon retina or dynamic vision sensor, is an imaging sensor that responds to local changes in brightness. Event cameras do not capture images using a shutter as conventional cameras do. Instead, each pixel inside an event camera operates independently and asynchronously, reporting changes in brightness as they occur and staying silent otherwise. Modern event cameras have microsecond temporal resolution, 120 dB dynamic range, and less under/overexposure and motion blur than frame cameras." +computer-vision,state change object detection, +computer-vision,document to image conversion, +computer-vision,driver attention monitoring,"Driver attention monitoring is the task of monitoring the attention of a driver. + +( Image credit: [Predicting Driver Attention in Critical Situations](https://arxiv.org/pdf/1711.06406v3.pdf) )" +computer-vision,few shot anomaly detection,Perform anomaly detection with Few shot settings +computer-vision,person centric visual grounding,"Person-centric visual grounding is the problem of linking between people named in a caption and people pictured in an image. Introduced in ""Who's Waldo? Linking People Across Text and Images"" (Cui et al, ICCV 2021)." +computer-vision,safety perception recognition,City safety perception recognition +computer-vision,car instance understanding,"3D Car Instance Understanding is the task of estimating properties (e.g.translation, rotation and shape) of a moving or parked vehicle on the road. + +( Image credit: [Occlusion-Net](http://openaccess.thecvf.com/content_CVPR_2019/papers/Reddy_Occlusion-Net_2D3D_Occluded_Keypoint_Localization_Using_Graph_Networks_CVPR_2019_paper.pdf) )" +computer-vision,font style transfer, +computer-vision,reference based video super resolution,"Reference-based video super-resolution (RefVSR) is an expansion of reference-based super-resolution (RefSR) to the video super-resolution (VSR). RefVSR inherits the objectives of both RefSR and VSR tasks and utilizes a Ref video for reconstructing an HR video from an LR video +video from an LR video." +computer-vision,multi animal tracking with identification,Tracking all animals in a video maintaining their identities after touches or occlusions. +computer-vision,face quality assessement,Estimate the usability of a given face image for recognition +computer-vision,unsupervised image classification,"Models that learn to label each image (i.e. cluster the dataset into its ground truth classes) without seeing the ground truth labels. + + Image credit: ImageNet clustering results of [SCAN: Learning to Classify Images without Labels (ECCV 2020)](https://arxiv.org/abs/2005.12320) " +computer-vision,grayscale image denoising, +computer-vision,facial editing,Image source: [Stitch it in Time: GAN-Based Facial Editing of Real Videos](https://arxiv.org/pdf/2201.08361v2.pdf) +computer-vision,object proposal generation,"Object proposal generation is a preprocessing technique that has been widely used in current object detection pipelines to guide the search of objects and avoid exhaustive sliding window search across images. + +( Image credit: [Multiscale Combinatorial Grouping +for Image Segmentation and Object Proposal Generation](https://arxiv.org/pdf/1503.00848v4.pdf) )" +computer-vision,gait identification, +computer-vision,neural radiance caching,"Involves the task of predicting photorealistic pixel colors from feature buffers. + +Image source: [Instant Neural Graphics Primitives with a Multiresolution Hash Encoding](https://arxiv.org/pdf/2201.05989v1.pdf)" +computer-vision,single image haze removal, +computer-vision,histopathological segmentation, +computer-vision,texture synthesis,"The fundamental goal of example-based **Texture Synthesis** is to generate a texture, usually larger than the input, that faithfully captures all the visual characteristics of the exemplar, yet is neither identical to it, nor exhibits obvious unnatural looking artifacts. + + +Source: [Non-Stationary Texture Synthesis by Adversarial Expansion ](https://arxiv.org/abs/1805.04487)" +computer-vision,chinese landscape painting generation, +computer-vision,weakly supervised action segmentation,Action Segmentation from weak (transcript) supervision. +computer-vision,unsupervised semantic segmentation,"Models that learn to segment each image (i.e. cluster the pixels into their ground truth classes) without seeing the ground truth labels. + +( Image credit: [SegSort: Segmentation by Discriminative Sorting of Segments](http://openaccess.thecvf.com/content_ICCV_2019/papers/Hwang_SegSort_Segmentation_by_Discriminative_Sorting_of_Segments_ICCV_2019_paper.pdf) )" +computer-vision,scene text recognition,See [Scene Text Detection](https://paperswithcode.com/task/scene-text-detection) for leaderboards in this task. +computer-vision,scene generation, +computer-vision,image instance retrieval,"**Image Instance Retrieval** is the problem of retrieving images from a database representing the same object or scene as the one depicted in a query image. + + +Source: [Compression of Deep Neural Networks for Image Instance Retrieval ](https://arxiv.org/abs/1701.04923)" +computer-vision,photo to caricature translation,"Photo-to-caricature translation is the task of adapting a photo to a cartoon or sketch. + +( Image credit: [WarpGAN](https://arxiv.org/pdf/1811.10100v3.pdf) )" +computer-vision,handwriting recognition,Image source: [Handwriting Recognition of Historical Documents with few labeled data](https://arxiv.org/pdf/1811.07768v1.pdf) +computer-vision,image morphing, +computer-vision,defocus estimation, +computer-vision,jpeg decompression,Image credit: [Palette: Image-to-Image Diffusion Models](https://paperswithcode.com/paper/palette-image-to-image-diffusion-models) +computer-vision,box supervised instance segmentation,This task aims to achieve instance segmentation with weakly bounding box annotations. +computer-vision,instance search,"Visual **Instance Search** is the task of retrieving from a database of images the ones that contain an instance of a visual query. It is typically much more challenging than finding images from the database that contain objects belonging to the same category as the object in the query. If the visual query is an image of a shoe, visual Instance Search does not try to find images of shoes, which might differ from the query in shape, color or size, but tries to find images of the exact same shoe as the one in the query image. Visual Instance Search challenges image representations as the features extracted from the images must enable such fine-grained recognition despite variations in viewpoints, scale, position, illumination, etc. Whereas holistic image representations, where each image is mapped to a single high-dimensional vector, are sufficient for coarse-grained similarity retrieval, local features are needed for instance retrieval. + + +Source: [Dynamicity and Durability in Scalable Visual Instance Search ](https://arxiv.org/abs/1805.10942)" +computer-vision,shape representation of point clouds, +computer-vision,image retargeting, +computer-vision,human object interaction motion tracking, +computer-vision,brain landmark detection, +computer-vision,patch matching, +computer-vision,hyperspectral image segmentation, +computer-vision,transparent object depth estimation,Estimating the 3D shape of transparent objects +computer-vision,stereoscopic image quality assessment, +computer-vision,single image desnowing, +computer-vision,unsupervised facial landmark detection,"Facial landmark detection in the unsupervised setting popularized by [1]. The evaluation occurs in two stages: +(1) Embeddings are first learned in an unsupervised manner (i.e. without labels); +(2) A simple regressor is trained to regress landmarks from the unsupervised embedding. + +[1] Thewlis, James, Hakan Bilen, and Andrea Vedaldi. ""Unsupervised learning of object landmarks by factorized spatial embeddings."" Proceedings of the IEEE International Conference on Computer Vision. 2017. + +( Image credit: [Unsupervised learning of object landmarks by factorized spatial embeddings](https://www.robots.ox.ac.uk/~vedaldi/assets/pubs/thewlis17unsupervised.pdf) )" +computer-vision,shape representation,Image: [MeshNet](https://arxiv.org/pdf/1811.11424v1.pdf) +computer-vision,image outpainting,"Predicting the visual context of an image beyond its boundary. + +Image credit: [NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis](https://paperswithcode.com/paper/nuwa-infinity-autoregressive-over?from=n35)" +computer-vision,zero shot transfer image classification, +computer-vision,indoor scene synthesis, +computer-vision,camera shot boundary detection,"The objective of camera shot boundary detection is to find the transitions between the camera shots in a video and classify the type of camera transition. This task is introduced in SoccerNet-v2, where 3 types of transitions are considered (abrupt, logo, smooth)." +computer-vision,face presentation attack detection, +computer-vision,fast vehicle detection,Fast vehicle detection is the task of detecting fast or speeding vehicles from video footage. +computer-vision,boundary captioning,"Provided with the timestamp of a boundary inside a video, the machine is required to generate sentences describing the status change at the boundary." +computer-vision,facial action unit detection,"Facial action unit detection is the task of detecting action units from a video of a face - for example, lip tightening and cheek raising. + +( Image credit: [Self-supervised Representation Learning from Videos for Facial Action Unit Detection](http://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Self-Supervised_Representation_Learning_From_Videos_for_Facial_Action_Unit_Detection_CVPR_2019_paper.pdf) )" +computer-vision,highlight detection, +computer-vision,extreme video frame interpolation,"Type of Video Frame Interpolation (VFI) that interpolates an intermediate frame on X4K1000FPS dataset containing 4K videos of 1000 fps with the extreme motion. The dataset has a wide variety of textures, extremely large motions, zoomings and occlusions, which have never been seen in the previous VFI benchmark datasets." +computer-vision,seeing beyond the visible,"The objective of this challenge is to automate the process of estimating the soil parameters, specifically, potassium (KKK), phosphorus pentoxide (P2O5P_2O_5P2​O5​), magnesium (MgMgMg) and pHpHpH, through extracting them from the airborne hyperspectral images captured over agricultural areas in Poland (the exact locations are not revealed). To make the solution applicable in real-life use cases, all the parameters should be estimated as precisely as possible." +computer-vision,point cloud generation, +computer-vision,person reposing,Person reposing describes the task of changing the pose of a human in a given image to any desired target pose. +computer-vision,ood detection,Out of Distribution Detection: detecting instances that do not belong to the distribution the classifier has been trained on. +computer-vision,image imputation,"Image imputation is the task of creating plausible images from low-resolution images or images with missing data. + +( Image credit: [NASA](https://www.jpl.nasa.gov/edu/news/2019/4/19/how-scientists-captured-the-first-image-of-a-black-hole/) )" +computer-vision,surgical tool detection,Presence detection of various classes of surgical instruments in endoscopy videos. +computer-vision,dense shape correspondence,"Finding a meaningful correspondence between two or more shapes is one of the most fundamental shape analysis tasks. The problem can be generally stated as: given input shapes S1,S2,...,SN, find a meaningful relation (or mapping) between their elements. Under different contexts, the problem has also been referred to as registration, alignment, or simply, matching. Shape correspondence is a key algorithmic component in tasks such as 3D scan alignment and space-time reconstruction, as well as an indispensable prerequisite in diverse applications including attribute transfer, shape interpolation, and statistical modeling." +computer-vision,multi exposure image fusion, +computer-vision,self supervised learning,"**Self-Supervised Learning** is proposed for utilizing unlabeled data with the success of supervised learning. Producing a dataset with good labels is expensive, while unlabeled data is being generated all the time. The motivation of Self-Supervised Learning is to make use of the large amount of unlabeled data. The main idea of Self-Supervised Learning is to generate the labels from unlabeled data, according to the structure or characteristics of the data itself, and then train on this unsupervised data in a supervised manner. Self-Supervised Learning is wildly used in representation learning to make a model learn the latent features of the data. This technique is often employed in computer vision, video processing and robot control. + + +Source: [Self-supervised Point Set Local Descriptors for Point Cloud Registration ](https://arxiv.org/abs/2003.05199) + +Image source: [LeCun](https://www.youtube.com/watch?v=7I0Qt7GALVk)" +computer-vision,keypoint detection,"Keypoint detection involves simultaneously detecting people and localizing their keypoints. Keypoints are the same thing as interest points. They are spatial locations, or points in the image that define what is interesting or what stand out in the image. They are invariant to image rotation, shrinkage, translation, distortion, and so on. + +( Image credit: [PifPaf: Composite Fields for Human Pose Estimation](https://github.com/vita-epfl/openpifpaf); ""Learning to surf"" by fotologic, license: CC-BY-2.0 )" +computer-vision,set matching, +computer-vision,optical flow estimation,"**Optical Flow Estimation** is the problem of finding pixel-wise motions between consecutive images. + +Approaches for optical flow estimation include correlation-based, block-matching, feature tracking, energy-based, and more recently gradient-based. + +Further readings: + +- [Optical Flow Estimation](https://www.cs.toronto.edu/~fleet/research/Papers/flowChapter05.pdf) +- [Performance of Optical Flow Techniques](https://www.cs.toronto.edu/~fleet/research/Papers/ijcv-94.pdf) + +Definition source: [Devon: Deformable Volume Network for Learning Optical Flow ](https://arxiv.org/abs/1802.07351) + +Image credit: [Optical Flow Estimation](https://www.cs.toronto.edu/~fleet/research/Papers/flowChapter05.pdf)" +computer-vision,face recognition,"Facial recognition is the task of making a positive identification of a face in a photo or video image against a pre-existing database of faces. It begins with detection - distinguishing human faces from other objects in the image - and then works on identification of those detected faces. + +The state of the art tables for this task are contained mainly in the consistent parts of the task : the face verification and face identification tasks. + +( Image credit: [Face Verification](https://shuftipro.com/face-verification) )" +computer-vision,instance segmentation,"Instance segmentation is the task of detecting and delineating each distinct object of interest appearing in an image. + +Image Credit: [Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers, CVPR'21](https://github.com/lkeab/BCNet)" +computer-vision,single image deraining, +computer-vision,semi supervised medical image segmentation, +computer-vision,recognizing and localizing human actions, +computer-vision,table recognition, +computer-vision,geometry perception,Image: [Zhao et al](https://arxiv.org/pdf/1812.10775v2.pdf) +computer-vision,simultaneous localization and mapping,"Simultaneous localization and mapping (SLAM) is the task of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent's location within it. + +( Image credit: [ORB-SLAM2](https://arxiv.org/pdf/1610.06475v2.pdf) )" +computer-vision,unsupervised human pose estimation, +computer-vision,lossy compression artifact reduction, +computer-vision,camouflaged object segmentation,"Camouflaged object segmentation (COS) or Camouflaged object detection (COD), which was originally promoted by [T.-N. Le et al.](https://www.sciencedirect.com/science/article/abs/pii/S1077314219300608) (2017), aims to identify objects that conceal their texture into the surrounding environment. The high intrinsic similarities between the target object and the background make COS/COD far more challenging than the traditional object segmentation task. Also, refer to the online benchmarks on [CAMO dataset](https://sites.google.com/view/ltnghia/research/camo), [COD dataset](http://dpfan.net/Camouflage/), and [online demo](http://mc.nankai.edu.cn/cod). + + +( Image source: [Anabranch Network for Camouflaged Object Segmentation](https://www.sciencedirect.com/science/article/abs/pii/S1077314219300608) )" +computer-vision,visual reasoning,Ability to understand actions and reasoning associated with any visual images +computer-vision,loop closure detection,"Loop closure detection is the process of detecting whether an agent has returned to a previously visited location. + +( Image credit: [Backtracking Regression Forests for Accurate Camera Relocalization](https://github.com/LiliMeng/btrf) )" +computer-vision,superpixels, +computer-vision,image smoothing, +computer-vision,cross corpus, +computer-vision,human detection, +computer-vision,license plate recognition, +computer-vision,saliency prediction,A saliency map is a model that predicts eye fixations on a visual scene. +computer-vision,activity recognition,"Human **Activity Recognition** is the problem of identifying events performed by humans given a video input. It is formulated as a binary (or multiclass) classification problem of outputting activity class labels. Activity Recognition is an important problem with many societal applications including smart surveillance, video search/retrieval, intelligent robots, and other monitoring systems. + + +Source: [Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters ](https://arxiv.org/abs/1605.08140)" +computer-vision,displaced people recognition,"Recognise displaced people from images. + +( Image credit: [DisplaceNet: Recognising Displaced People from Images by Exploiting Dominance Level](https://arxiv.org/pdf/1905.02025v1.pdf) )" +computer-vision,handwriting verification,The goal of handwriting verification is to find a measure of confidence whether the given handwritten samples are written by the same or different writer. +computer-vision,intubation support prediction,Prediction of need for Intubation support of Covid-19 patients. +computer-vision,mistake detection,"Mistakes are natural occurrences in many tasks and an opportunity for an AR assistant to provide help. Identifying such mistakes requires modelling procedural knowledge and retaining long-range sequence information. In its simplest form Mistake Detection aims to classify each coarse action segment into one of the three classes: {“correct”, “mistake”, “correction”}." +computer-vision,face reconstruction,"3D face reconstruction is the task of reconstructing a face from an image into a 3D form (or mesh). + +( Image credit: [3DDFA_V2](https://github.com/cleardusk/3DDFA_V2) )" +computer-vision,absolute human pose estimation,"This task aims to solve absolute (camera-centric not root-relative) 3D human pose estimation. + +( Image credit: [RootNet](https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE) )" +computer-vision,action analysis, +computer-vision,multiview detection,Incorporating multiple camera views for detection in heavily occluded scenarios. +computer-vision,drone based object tracking,drone-based object tracking +computer-vision,video domain adapation,Unsupervised Domain Adaptation on Videos for the task of Action Recognition. +computer-vision,mutual gaze,Detect if two people are looking at each other +computer-vision,sar image despeckling,"Despeckling is the task of suppressing speckle from Synthetic Aperture Radar (SAR) acquisitions. + +Image credits: GRD Sentinel-1 SAR image despeckled with [SAR2SAR-GRD](https://arxiv.org/abs/2102.00692)" +computer-vision,skills assessment, +computer-vision,facial landmark detection,"Facial landmark detection is the task of detecting key landmarks on the face and tracking them (being robust to rigid and non-rigid facial deformations due to head movements and facial expressions). + +( Image credit: [Style Aggregated Network for Facial Landmark Detection](https://arxiv.org/pdf/1803.04108v4.pdf) )" +computer-vision,natural language moment retrieval, +computer-vision,attentive segmentation networks, +computer-vision,line segment detection, +computer-vision,motion estimation,"**Motion Estimation** is used to determine the block-wise or pixel-wise motion vectors between two frames. + + +Source: [MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement ](https://arxiv.org/abs/1810.08768)" +computer-vision,pose tracking,"**Pose Tracking** is the task of estimating multi-person human poses in videos and assigning unique instance IDs for each keypoint across frames. Accurate estimation of human keypoint-trajectories is useful for human action recognition, human interaction understanding, motion capture and animation. + + +Source: [LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking ](https://arxiv.org/abs/1905.02822)" +computer-vision,network interpretation, +computer-vision,manufacturing quality control,AI for Quality control in manufacturing processes. +computer-vision,cbc test, +computer-vision,lightfield,Tasks related to the light-field imagery +computer-vision,factual visual question answering, +computer-vision,concurrent activity recognition, +computer-vision,face to face translation,"Given a video of a person speaking in a source language, generate a video of the same person speaking in a target language." +computer-vision,gan image forensics, +computer-vision,soil moisture estimation, +computer-vision,generalized few shot classification, +computer-vision,cyclops accuracy,"The WT2 dataset from the CYCLoPs database consists of 27,058 single-cell images of yeast cells. The task is to classify the subcellular localization of a fluoresced protein, given +two channels staining for the protein of interest and the cytosol." +computer-vision,few shot camera adaptive color constancy, +computer-vision,perpetual view generation,**Perpetual View Generation** is the task of generating long-range novel views by flying into a given image. +computer-vision,bbbc021 nsc accuracy,"BBBC021 is a dataset of fully imaged human cells. Cells are treated with one of 113 small molecules at 8 concentrations, and fluorescent images are captured staining for nucleus, actin and microtubules. The phenotypic profiling problem is presented, where the goal is to extract features containing meaningful information about the cellular phenotype exhibited. Each of 103 unique compound concentration treatment is labeled with a mechanism-of-action (MOA). The MOA is predicted for each unique treatment (averaging features over all treatment examples) by matching the MOA of the closest point excluding points of the same compound. The dataset and more information can be found at https://bbbc.broadinstitute.org/BBBC021." +computer-vision,image comprehension, +computer-vision,interactive segmentation, +computer-vision,image captioning,"**Image Captioning** is the task of describing the content of an image in words. This task lies at the intersection of computer vision and natural language processing. Most image captioning systems use an encoder-decoder framework, where an input image is encoded into an intermediate representation of the information in the image, and then decoded into a descriptive text sequence. The most popular benchmarks are nocaps and COCO, and models are typically evaluated according to a BLEU or CIDER metric. + +( Image credit: [Reflective Decoding Network for Image Captioning, ICCV'19](https://openaccess.thecvf.com/content_ICCV_2019/papers/Ke_Reflective_Decoding_Network_for_Image_Captioning_ICCV_2019_paper.pdf) )" +computer-vision,disjoint 15 1, +computer-vision,fine grained visual recognition, +computer-vision,polyp segmentation,The goal of the project is to develop a computer-aided detection and diagnosis system for automatic polyp segmentation and detection. +computer-vision,blood cell count, +computer-vision,human pose estimation,"What is Human Pose Estimation? +Human pose estimation is the process of estimating the configuration of the body (pose) from a single, typically monocular, image. Background. Human pose estimation is one of the key problems in computer vision that has been studied for well over 15 years. The reason for its importance is the +abundance of applications that can benefit from such a technology. For example, +human pose estimation allows for higher-level reasoning in the context of human-computer interaction and activity recognition; it is also one of the basic building blocks for marker-less motion capture (MoCap) technology. MoCap technology is useful for applications ranging from character animation to clinical analysis of gait pathologies." +computer-vision,gaze estimation,"**Gaze Estimation** is a task to predict where a person is looking at given the person’s full face. The task contains two directions: 3-D gaze vector and 2-D gaze position estimation. 3-D gaze vector estimation is to predict the gaze vector, which is usually used in the automotive safety. 2-D gaze position estimation is to predict the horizontal and vertical coordinates on a 2-D screen, which allows utilizing gaze point to control a cursor for human-machine interaction. + + +Source: [A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone ](https://arxiv.org/abs/1910.07331)" +computer-vision,person recognition, +computer-vision,semantic part detection, +computer-vision,video question answering,"Video Question Answering (VideoQA) aims to answer natural language questions according to the +given videos. Given a video and a question in natural language, the model produces accurate answers according +to the content of the video." +computer-vision,depth completion,"The **Depth Completion** task is a sub-problem of depth estimation. In the sparse-to-dense depth completion problem, one wants to infer the dense depth map of a 3-D scene given an RGB image and its corresponding sparse reconstruction in the form of a sparse depth map obtained either from computational methods such as SfM (Strcuture-from-Motion) or active sensors such as lidar or structured light sensors. + +Source: [LiStereo: Generate Dense Depth Maps from LIDAR and Stereo Imagery ](https://arxiv.org/abs/1905.02744), +[Unsupervised Depth Completion from Visual Inertial Odometry](https://arxiv.org/abs/1905.08616)" +computer-vision,semi supervised video object segmentation,The semi-supervised scenario assumes the user inputs a full mask of the object(s) of interest in the first frame of a video sequence. Methods have to produce the segmentation mask for that object(s) in the subsequent frames. +computer-vision,video object segmentation,"Video object segmentation is a binary labeling problem aiming to separate foreground object(s) from the background region of a video. + +For leaderboards please refer to the different subtasks." +computer-vision,face sketch synthesis,"Face sketch synthesis is the task of generating a sketch from an input face photo. + +( Image credit: [High-Quality Facial Photo-Sketch Synthesis Using Multi-Adversarial Networks](https://arxiv.org/pdf/1710.10182v2.pdf) )" +computer-vision,human dynamics,Image: [Zhang et al](https://openaccess.thecvf.com/content_ICCV_2019/papers/Zhang_Predicting_3D_Human_Dynamics_From_Video_ICCV_2019_paper.pdf) +computer-vision,personality trait recognition, +computer-vision,conditional image generation,"Conditional image generation is the task of generating new images from a dataset conditional on their class. + +( Image credit: [PixelCNN++](https://github.com/openai/pixel-cnn) )" +computer-vision,trajectory prediction,"**Trajectory Prediction** is the problem of predicting the short-term (1-3 seconds) and long-term (3-5 seconds) spatial coordinates of various road-agents such as cars, buses, pedestrians, rickshaws, and animals, etc. These road-agents have different dynamic behaviors that may correspond to aggressive or conservative driving styles. + + +Source: [Forecasting Trajectory and Behavior of Road-Agents Using Spectral Clustering in Graph-LSTMs ](https://arxiv.org/abs/1912.01118)" +computer-vision,small data,Supervised image classification with tens to hundreds of labeled training examples. +computer-vision,satellite image classification,"Satellite image classification is the most significant technique used in remote sensing for the computerized study and pattern recognition of satellite information, which is based on diversity structures of the image that involve rigorous validation of the training samples depending on the used classification algorithm." +computer-vision,action localization,"Action Localization is finding the spatial and temporal co ordinates for an action in a video. An action localization model will identify which frame an action start and ends in video and return the x,y coordinates of an action. Further the co ordinates will change when the object performing action undergoes a displacement." +computer-vision,point cloud reconstruction,"This task aims to solve inherent problems in raw point clouds: sparsity, noise, and irregularity." +computer-vision,video salient object detection,"Video salient object detection (VSOD) is significantly essential for understanding the underlying mechanism behind HVS during free-viewing in general and instrumental to a wide range of real-world applications, e.g., video segmentation, video captioning, video compression, autonomous driving, robotic interaction, weakly supervised attention. Besides its academic value and practical significance, VSOD presents great difficulties due to the challenges carried by video data (diverse motion patterns, occlusions, blur, large object deformations, etc.) and the inherent complexity of human visual attention behavior (i.e., selective attention allocation, attention shift) during dynamic scenes. Online benchmark: http://dpfan.net/davsod. + +( Image credit: [Shifting More Attention to Video Salient Object Detection, CVPR2019-Best Paper Finalist](https://openaccess.thecvf.com/content_CVPR_2019/papers/Fan_Shifting_More_Attention_to_Video_Salient_Object_Detection_CVPR_2019_paper.pdf) )" +computer-vision,affordance recognition,Affordance recognition from Human-Object Interaction +computer-vision,general action video anomaly detection,Detecting if an entire short clip of a any action features an anomalous motion - another action class not seen during training. +computer-vision,zero shot action recognition, +computer-vision,generating point clouds, +computer-vision,sketch based image retrieval, +computer-vision,video propagation,Propagating information in processed frames to unprocessed frames +computer-vision,mental workload estimation, +computer-vision,novel class discovery,"The goal of Novel Class Discovery (NCD) is to identify new classes in unlabeled data, by exploiting prior knowledge from known classes. In this specific setup, the data is split in two sets. The first is a labeled set containing known classes and the second is an unlabeled set containing unknown classes that must be discovered." +computer-vision,lane detection,"Lane detection is the task of detecting lanes on a road from a camera. + +( Image credit: [End-to-end Lane Detection +](https://github.com/wvangansbeke/LaneDetection_End2End) )" +computer-vision,inverse tone mapping, +computer-vision,multi person pose estimation and tracking,"Joint multi-person pose estimation and tracking following the PoseTrack benchmark. +https://posetrack.net/ + +( Image credit: [PoseTrack](https://github.com/iqbalu/PoseTrack-CVPR2017) )" +computer-vision,one shot face stylization,"Image credit: [""JoJoGAN: One Shot Face Stylization""](https://arxiv.org/pdf/2112.11641v1.pdf)" +computer-vision,rf based visual tracking,"From mID: +https://doi.org/10.1109/DCOSS.2019.00028 + +""The key to offering personalised services in smart spaces is knowing where a particular person is with a high degree of accuracy. Visual tracking is one such solution, but concerns arise around the potential leakage of raw video information and many people are not comfortable accepting cameras in their homes or workplaces. We propose a human tracking and identification system (mID) based on millimeter wave radar which has a high tracking accuracy, without being visually compromising. Unlike competing techniques based on WiFi Channel State Information (CSI), it is capable of tracking and identifying multiple people simultaneously. Using a lowcost, commercial, off-the-shelf radar, we first obtain sparse point clouds and form temporally associated trajectories."" + +( Image credit: [mID: Tracking and Identifying People with Millimeter Wave Radar](http://www.cs.ox.ac.uk/files/10889/%5BDCOSS19%5DmID.pdf) )" +computer-vision,visual tracking,"**Visual Tracking** is an essential and actively researched problem in the field of computer vision with various real-world applications such as robotic services, smart surveillance systems, autonomous driving, and human-computer interaction. It refers to the automatic estimation of the trajectory of an arbitrary target object, usually specified by a bounding box in the first frame, as it moves around in subsequent video frames. + + +Source: [Learning Reinforced Attentional Representation for End-to-End Visual Tracking ](https://arxiv.org/abs/1908.10009)" +computer-vision,landmark based segmentation, +computer-vision,talking face generation,"Talking face generation aims to synthesize a sequence of face images that correspond to given speech semantics. + + +( Image credit: [Talking Face Generation by Adversarially Disentangled Audio-Visual Representation](https://github.com/Hangz-nju-cuhk/Talking-Face-Generation-DAVS) )" +computer-vision,zero shot object detection,"Zero-shot object detection (ZSD) is the task of object detection where no visual training data is available for some of the target object classes. + +( Image credit: [Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts](https://github.com/salman-h-khan/ZSD_Release) )" +computer-vision,lidar semantic segmentation, +computer-vision,synthetic to real translation,"Synthetic-to-real translation is the task of domain adaptation from synthetic (or virtual) data to real data. + +( Image credit: [CYCADA](https://arxiv.org/pdf/1711.03213v3.pdf) )" +computer-vision,unet quantization, +computer-vision,few shot object detection,"Target: To detect objects of novel categories with just a few training samples. + +A clear explanation of the few-shot object detection task and its differences with few-shot classification can be found in ""A Survey of Self-Supervised and Few-Shot Object Detection"": +https://gabrielhuang.github.io/fsod-survey/" +computer-vision,vehicle key point and orientation estimation, +computer-vision,video denoising, +computer-vision,video text retrieval,Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task. +computer-vision,multi label image classification,The Multi-Label Image Classification focuses on predicting labels for images in a multi-class classification problem where each image may belong to more than one class. +computer-vision,human reconstruction, +computer-vision,image enhancement,"**Image Enhancement** is basically improving the interpretability or perception of information in images for human viewers and providing ‘better’ input for other automated image processing techniques. The principal objective of Image Enhancement is to modify attributes of an image to make it more suitable for a given task and a specific observer. + + +Source: [A Comprehensive Review of Image Enhancement Techniques ](https://arxiv.org/abs/1003.4053)" +computer-vision,breast cancer histology image classification 1,Model's breast cancer histology image classification performance on BreakHis dataset with limited training data labels of 20%. +computer-vision,sensor modeling,"( Image credit: [LiDAR Sensor modeling and Data augmentation with GANs for Autonomous driving](https://arxiv.org/abs/1905.07290) )" +computer-vision,tumor segmentation,Tumor Segmentation is the task of identifying the spatial location of a tumor. It is a pixel-level prediction where each pixel is classified as a tumor or background. The most popular benchmark for this task is the BraTS dataset. The models are typically evaluated with the Dice Score metric. +computer-vision,blind image deblurring,"**Blind Image Deblurring** is a classical problem in image processing and computer vision, which aims to recover a latent image from a blurred input. + + +Source: [Learning a Discriminative Prior for Blind Image Deblurring ](https://arxiv.org/abs/1803.03363)" +computer-vision,multi view subspace clustering, +computer-vision,generalized zero shot learning, +computer-vision,salient object detection,"RGB Salient object detection is a task-based on a visual attention mechanism, in which algorithms aim to explore objects or regions more attentive than the surrounding areas on the scene or RGB images. + +( Image credit: [Attentive Feedback Network for Boundary-Aware Salient Object Detection](http://openaccess.thecvf.com/content_CVPR_2019/papers/Feng_Attentive_Feedback_Network_for_Boundary-Aware_Salient_Object_Detection_CVPR_2019_paper.pdf) )" +computer-vision,artistic style classification,Classify the artistic style of an artwork image +computer-vision,landmine, +computer-vision,road damage detection,"Road damage detection is the task of detecting damage in roads. + +( Image credit: [Road Damage Detection And Classification In Smartphone Captured Images Using Mask R-CNN](https://arxiv.org/pdf/1811.04535v1.pdf) )" +computer-vision,compositional zero shot learning, +computer-vision,word spotting in handwritten documents, +computer-vision,human part segmentation, +computer-vision,steganographics, +computer-vision,video story qa,MCQ about clips from movies/tvshows/etc +computer-vision,heterogeneous face recognition,"Heterogeneous face recognition is the task of matching face images acquired from different sources (i.e., different sensors or different wavelengths) for identification or verification. + +( Image credit: [Pose Agnostic Cross-spectral Hallucination via Disentangling Independent Factors](https://arxiv.org/pdf/1909.04365v1.pdf) )" +computer-vision,semi supervised person re identification, +computer-vision,lake ice detection, +computer-vision,camouflage segmentation, +computer-vision,scene recognition, +computer-vision,person retrieval, +computer-vision,dynamic texture recognition, +computer-vision,supervised anomaly detection,"In the training set, the amount of abnormal samples is limited and significant fewer than normal samples, producing data distributions that lead to a naturally imbalanced learning problem." +computer-vision,video summarization,"**Video Summarization** aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. *video key-frames*), or video fragments (a.k.a. *video key-fragments*) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as **video storyboard**, and the latter type is known as **video skim**. + +Source: [Video Summarization Using Deep Neural Networks: A Survey](https://arxiv.org/abs/2101.06072)
+Image credit: [iJRASET](https://www.ijraset.com/fileserve.php?FID=12932)" +computer-vision,video compression,"**Video Compression** is a process of reducing the size of an image or video file by exploiting spatial and temporal redundancies within an image or video frame and across multiple video frames. The ultimate goal of a successful Video Compression system is to reduce data volume while retaining the perceptual quality of the decompressed data. + + +Source: [Adversarial Video Compression Guided by Soft Edge Detection ](https://arxiv.org/abs/1811.10673)" +computer-vision,multiple action detection, +computer-vision,canonical hand pose estimation,Image: [Lin et al](https://arxiv.org/pdf/2006.01320v1.pdf) +computer-vision,motion segmentation,"**Motion Segmentation** is an essential task in many applications in Computer Vision and Robotics, such as surveillance, action recognition and scene understanding. The classic way to state the problem is the following: given a set of feature points that are tracked through a sequence of images, the goal is to cluster those trajectories according to the different motions they belong to. It is assumed that the scene contains multiple objects that are moving rigidly and independently in 3D-space. + + +Source: [Robust Motion Segmentation from Pairwise Matches ](https://arxiv.org/abs/1905.09043)" +computer-vision,autonomous flight dense forest,Number of interventions during autonomous flight under the forest canopy. +computer-vision,corpus video moment retrieval,The task extends the Single Video Moment Retrieval task to the Corpus setup where a single textual query is used to temporally localize relevant moments across all videos in the dataset. +computer-vision,lake detection, +computer-vision,spatio temporal action localization, +computer-vision,action understanding, +computer-vision,gait recognition,"( Image credit: [GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition](https://github.com/AbnerHqC/GaitSet) )" +computer-vision,shape recognition,Image: [Wei et al](https://arxiv.org/pdf/1908.10098v1.pdf) +computer-vision,object discovery,"**Object Discovery** is the task of identifying previously unseen objects. + + +Source: [Unsupervised Object Discovery and Segmentation of RGBD-images ](https://arxiv.org/abs/1710.06929)" +computer-vision,single image based hdr reconstruction, +computer-vision,vehicle re identification,"Vehicle re-identification is the task of identifying the same vehicle across multiple cameras. + +( Image credit: [A Two-Stream Siamese Neural Network for Vehicle Re-Identification by Using Non-Overlapping Cameras](https://github.com/icarofua/siamese-two-stream) )" +computer-vision,blink estimation, +computer-vision,saliency prediction 1,"Saliency prediction aims to predict important locations in a visual scene. It is a per-pixel regression task with predicted values ranging from 0 to 1. + +Benefiting from deep learning research and large-scale datasets, saliency prediction has achieved significant success in the past decade. However, it still remains challenging to predict saliency maps on images in new domains that lack sufficient data for data-hungry models." +computer-vision,emotion classification,"Emotion classification, or emotion categorization, is the task of recognising emotions to classify them into the corresponding category. Given an input, classify it as 'neutral or no emotion' or as one, or more, of several given emotions that best represent the mental state of the subject's facial expression, words, and so on. Some example benchmarks include ROCStories, Many Faces of Anger (MFA), and GoEmotions. Models can be evaluated using metrics such as the Concordance Correlation Coefficient (CCC) and the Mean Squared Error (MSE)." +computer-vision,unconditional video generation, +computer-vision,image compression artifact reduction, +computer-vision,abnormal event detection in video,"**Abnormal Event Detection In Video** is a challenging task in computer vision, as the definition of what an abnormal event looks like depends very much on the context. For instance, a car driving by on the street is regarded as a normal event, but if the car enters a pedestrian area, this is regarded as an abnormal event. A person running on a sports court (normal event) versus running outside from a bank (abnormal event) is another example. Although what is considered abnormal depends on the context, we can generally agree that abnormal events should be unexpected events that occur less often than familiar (normal) events + + +Source: [Unmasking the abnormal events in video ](https://arxiv.org/abs/1705.08182) + +Image: [Ravanbakhsh et al](https://arxiv.org/pdf/1708.09644v1.pdf)" +computer-vision,video captioning,"**Video Captioning** is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text. + + +Source: [NITS-VC System for VATEX Video Captioning Challenge 2020 ](https://arxiv.org/abs/2006.04058)" +computer-vision,facial makeup transfer,Facial makeup transfer aims to translate the **makeup style** from a given *reference* makeup face image to another non-makeup one while *preserving face identity*. +computer-vision,image recognition, +computer-vision,scene classification,"**Scene Classification** is a task in which scenes from photographs are categorically classified. Unlike object classification, which focuses on classifying prominent objects in the foreground, Scene Classification uses the layout of objects within the scene, in addition to the ambient context, for classification. + + +Source: [Scene classification with Convolutional Neural Networks ](http://cs231n.stanford.edu/reports/2017/pdfs/102.pdf)" +computer-vision,whole slide images, +computer-vision,interactive video object segmentation,"The interactive scenario assumes the user gives iterative refinement inputs to the algorithm, in our case in the form of a scribble, to segment the objects of interest. Methods have to produce a segmentation mask for that object in all the frames of a video sequence taking into account all the user interactions." +computer-vision,few shot video object detection,"Few-Shot Video Object Detection +(FSVOD): given only a few support images of the target +object in an unseen class, detect all the objects belonging to +the same class in a given query video." +computer-vision,genre classification,"Genre classification is the process of grouping objects together based on defined similarities such as shape, pixel, location, or intensity." +computer-vision,facial expression recognition,"3D facial expression recognition is the task of modelling facial expressions in 3D from an image or video. + +( Image credit: [Expression-Net](https://github.com/fengju514/Expression-Net) )" +computer-vision,video super resolution,"Video super-resolution is the task of upscaling a video from a low-resolution to a high-resolution. + +( Image credit: [Detail-revealing Deep Video Super-Resolution](https://github.com/jiangsutx/SPMC_VideoSR) )" +computer-vision,online clustering,"Models that learn to label each image (i.e. cluster the dataset into its ground truth classes) without seeing the ground truth labels. Under the online scenario, data is in the form of streams, i.e., the whole dataset could not be accessed at the same time and the model should be able to make cluster assignments for new data without accessing the former data. + +Image Credit: [Online Clustering by Penalized Weighted GMM](https://arxiv.org/pdf/1902.02544v1.pdf)" +computer-vision,universal domain adaptation, +computer-vision,defect detection,For automatic detection of surface defects in various products +computer-vision,spectrum cartography, +computer-vision,nlp based person retrival, +computer-vision,grounded situation recognition,"Grounded Situation Recognition aims to produce the structured image summary which describes the primary activity (verb), its relevant entities (nouns), and their bounding-box groundings." +computer-vision,video object detection,"Video object detection is the task of detecting objects from a video as opposed to images. + +( Image credit: [Learning Motion Priors for Efficient Video Object Detection](https://arxiv.org/pdf/1911.05253v1.pdf) )" +computer-vision,dense captioning, +computer-vision,hybrid positioning,Hybrid Positioning using CV and dead reckoning +computer-vision,unsupervised image to image translation,"Unsupervised image-to-image translation is the task of doing image-to-image translation without ground truth image-to-image pairings. + +( Image credit: [Unpaired Image-to-Image Translation +using Cycle-Consistent Adversarial Networks](https://arxiv.org/pdf/1703.10593v6.pdf) )" +computer-vision,crop yield prediction, +computer-vision,semi supervised video classification, +computer-vision,grasp generation, +computer-vision,video enhancement, +computer-vision,generalized zero shot skeletal action,Generalized Zero Shot Learning for 3d Skeletal Action Recognition +computer-vision,dichotomous image segmentation,"Currently, existing image segmentation tasks mainly focus on segmenting objects with specific characteristics, e.g., salient, camouflaged, meticulous, or specific categories. Most of them have the same input/output formats, and barely use exclusive mechanisms designed for segmenting targets in their models, which means almost all tasks are dataset-dependent. Thus, it is very promising to formulate a category-agnostic DIS task for accurately segmenting objects with different structure complexities, regardless of their characteristics. Compared with semantic segmentation, the proposed DIS task usually focuses on images with single or a few targets, from which getting richer accurate details of each target is more feasible." +computer-vision,plane detection,Image: [Liu et al](https://arxiv.org/pdf/1812.04072v2.pdf) +computer-vision,image manipulation detection, +computer-vision,referring image matting,"Extracting the meticulous alpha matte of the specific object from the image that can best match the given natural language description, e.g., a keyword or a expression." +computer-vision,hdr reconstruction, +computer-vision,disparity estimation,The Disparity Estimation is the task of finding the pixels in the multiscopic views that correspond to the same 3D point in the scene. +computer-vision,face animation,Image: [Cudeiro et al](https://arxiv.org/pdf/1905.03079v1.pdf) +computer-vision,horizon line estimation, +computer-vision,zero shot skeletal action recognition,Zero-Shot Learning for 3D skeletal action recognition +computer-vision,sign language translation,"Given a video containing sign language, the task is to predict the translation into (written) spoken language. + +Image credit: [How2Sign](https://how2sign.github.io/)" +computer-vision,procedure learning,"Given a set of videos of the same task, the goal is to identify the key-steps required to perform the task." +computer-vision,monocular depth estimation,"**Monocular Depth Estimation** is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error. + +Source: [Defocus Deblurring Using Dual-Pixel Data ](https://arxiv.org/abs/2005.00305)" +computer-vision,spectral estimation, +computer-vision,image similarity search,Image credit: [The 2021 Image Similarity Dataset and Challenge](https://paperswithcode.com/paper/the-2021-image-similarity-dataset-and) +computer-vision,video kinematic base workflow recognition, +computer-vision,pose transfer, +computer-vision,weakly supervised object localization, +computer-vision,image based localization, +computer-vision,rgb d reconstruction, +computer-vision,font recognition,"Font recognition (also called *visual font recognition* or *optical font recognition*) is the task of identifying the font family or families used in images containing text. Understanding which fonts are used in text may, for example, help designers find the right style, as well as help select an optical character recognition engine or model that is a better fit for certain texts." +computer-vision,motion compensation, +computer-vision,camera calibration, +computer-vision,pose retrieval,Retrieval of similar human poses from images or videos +computer-vision,depth image upsampling, +computer-vision,camera auto calibration, +computer-vision,scene understanding,"Scene Understanding is something that to understand a scene. For instance, iPhone has function that help eye disabled person to take a photo by discribing what the camera sees. This is an example of Scene Understanding." +computer-vision,image super resolution,Super-resolution of images refers to augmenting and increasing the resolution of an image using classic and advanced super-resolution techniques. Often the term 'hallucinate' is used to refer to the process of creating data points. +computer-vision,monocular object localization, +computer-vision,data ablation,"Data Ablation is the study of change in data, and its effects in the performance of Neural Networks." +computer-vision,crowd counting,"**Crowd Counting** is a task to count people in image. It is mainly used in real-life for automated public monitoring such as surveillance and traffic control. Different from object detection, Crowd Counting aims at recognizing arbitrarily sized targets in various situations including sparse and cluttering scenes at the same time. + + +Source: [Deep Density-aware Count Regressor ](https://arxiv.org/abs/1908.03314)" +computer-vision,boundary grounding,"Provided with a description of a boundary inside a video, the machine is required to locate that boundary in the video." +computer-vision,real time object detection,"Real-time object detection is the task of doing object detection in real-time with fast inference while maintaining a base level of accuracy. + +( Image credit: [CenterNet](https://github.com/xingyizhou/CenterNet) )" +computer-vision,object slam,SLAM (Simultaneous Localisation and Mapping) at the level of object +computer-vision,multi label image retrieval, +computer-vision,hand pose estimation,"Hand pose estimation is the task of finding the joints of the hand from an image or set of video frames. + +( Image credit: [Pose-REN](https://github.com/xinghaochen/Pose-REN) )" +computer-vision,dense pixel correspondence estimation, +computer-vision,action anticipation,"Next action anticipation is defined as observing 1, ... , T frames and predicting the action that happens after a gap of T_a seconds. It is important to note that a new action starts after T_a seconds that is not seen in the observed frames. Here T_a=1 second." +computer-vision,anomaly classification, +computer-vision,pose prediction,Pose prediction is to predict future poses given a window of previous poses. +computer-vision,disguised face verification, +computer-vision,human action recognition,Image: [Rahmani et al](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Rahmani_3D_Action_Recognition_CVPR_2016_paper.pdf) +computer-vision,caricature,"**Caricature** is a pictorial representation or description that deliberately exaggerates a person’s distinctive features or peculiarities to create an easily identifiable visual likeness with a comic effect. This vivid art form contains the concepts of abstraction, simplification and exaggeration. + + +Source: [Alive Caricature from 2D to 3D ](https://arxiv.org/abs/1803.06802)" +computer-vision,hand gesture recognition, +computer-vision,unsupervised object localization, +computer-vision,cross view image to image translation, +computer-vision,accident anticipation, +computer-vision,tone mapping, +computer-vision,rf based pose estimation,"Detect human actions through walls and occlusions, and in poor lighting conditions. Taking radio frequency (RF) signals as input (e.g. Wifi), generating 3D human skeletons as an intermediate representation, and recognizing actions and interactions. + +See e.g. RF-Pose from MIT for a good illustration of the approach +http://rfpose.csail.mit.edu/ + +( Image credit: [Making the Invisible Visible](https://arxiv.org/pdf/1909.09300v1.pdf) )" +computer-vision,logo recognition, +computer-vision,video polyp segmentation, +computer-vision,monocular cross view road scene parsing road, +computer-vision,shape reconstruction from videos, +computer-vision,multiple people tracking, +computer-vision,visual object tracking,"**Visual Object Tracking** is an important research topic in computer vision, image understanding and pattern recognition. Given the initial state (centre location and scale) of a target in the first frame of a video sequence, the aim of Visual Object Tracking is to automatically obtain the states of the object in the subsequent video frames. + + +Source: [Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Object Tracking ](https://arxiv.org/abs/1807.11348)" +computer-vision,talking head generation,"Talking head generation is the task of generating a talking face from a set of images of a person. + +( Image credit: [Few-Shot Adversarial Learning of Realistic Neural Talking Head Models](https://arxiv.org/pdf/1905.08233v2.pdf) )" +computer-vision,fine grained action detection, +computer-vision,multi hypotheses human pose estimation, +computer-vision,image compression,"**Image Compression** is an application of data compression for digital images to lower their storage and/or transmission requirements. + + +Source: [Variable Rate Deep Image Compression With a Conditional Autoencoder ](https://arxiv.org/abs/1909.04802)" +computer-vision,salt and pepper noise removal,"Salt-and-pepper noise is a form of noise sometimes seen on images. It is also known as impulse noise. This noise can be caused by sharp and sudden disturbances in the image signal. It presents itself as sparsely occurring white and black pixels. + +( Image credit: [NAMF](https://arxiv.org/pdf/1910.07787v1.pdf) )" +computer-vision,spatial relation recognition, +computer-vision,hand detection,"As an important subject in the field of computer vision, hand detection plays an important role in many tasks such as human-computer interaction, automatic driving, virtual reality and so on." +computer-vision,spatio temporal semantic segmentation,Image: [Choy et al](https://paperswithcode.com/paper/4d-spatio-temporal-convnets-minkowski) +computer-vision,electron microscopy image segmentation, +computer-vision,face parsing,Classify pixels of a face image into different classes based on a given bounding box. +computer-vision,pansharpening, +computer-vision,object tracking, +computer-vision,semi supervised person bounding box detection, +computer-vision,volumetric reconstruction,Image: [Grinvald et al](https://arxiv.org/pdf/1903.00268.pdf) +computer-vision,video to video synthesis, +computer-vision,metamerism, +computer-vision,image level supervised instance segmentation,Weakly-Supervised Instance Segmentation using Image-level Labels +computer-vision,image relighting,Image relighting involves changing the illumination settings of an image. +computer-vision,image shadow removal, +computer-vision,image manipulation, +computer-vision,cross domain iris presentation attack, +computer-vision,art analysis, +computer-vision,smile recognition,Smile recognition is the task of recognising a smiling face in a photo or video. +computer-vision,image harmonization,Image harmonization aims to modify the color of the composited region with respect to the specific background. +computer-vision,observation completion, +computer-vision,semi supervised image classification,"Semi-supervised image classification leverages unlabelled data as well as labelled data to increase classification performance. + +You may want to read some blog posts to get an overview before reading the papers and checking the leaderboards: + +- [An overview of proxy-label approaches for semi-supervised learning](https://ruder.io/semi-supervised/) - Sebastian Ruder +- [Semi-Supervised Learning in Computer Vision](https://amitness.com/2020/07/semi-supervised-learning/) - Amit Chaudhary + +( Image credit: [Self-Supervised Semi-Supervised Learning](https://arxiv.org/pdf/1905.03670v2.pdf) )" +computer-vision,fine grained action recognition, +computer-vision,semantic instance segmentation,Image: [3D-SIS](https://github.com/Sekunde/3D-SIS) +computer-vision,image animation,Image Animation is a field for image-animation of a source image by a driving video +computer-vision,face hallucination,"Face hallucination is the task of generating high-resolution (HR) facial images from low-resolution (LR) inputs. + +( Image credit: [Deep CNN Denoiser and Multi-layer Neighbor Component Embedding for Face Hallucination](https://arxiv.org/pdf/1806.10726v1.pdf) )" +computer-vision,segmenting flooded buildings, +computer-vision,interest point detection, +computer-vision,classification of hyperspectral images, +computer-vision,moving object detection, +computer-vision,surface generation,Image: [AtlasNet](https://arxiv.org/pdf/1802.05384v3.pdf) +computer-vision,gesture recognition,"**Gesture Recognition** is an active field of research with applications such as automatic recognition of sign language, interaction of humans and robots or for new ways of controlling video games. + + +Source: [Gesture Recognition in RGB Videos Using Human Body Keypoints and Dynamic Time Warping ](https://arxiv.org/abs/1906.12171)" +computer-vision,image to gps verification,"The image-to-GPS verification task asks whether a given image is taken at a claimed GPS location. + +( Image credit: [Image-to-GPS Verification Through A Bottom-Up Pattern Matching Network](https://arxiv.org/pdf/1811.07288v1.pdf) )" +computer-vision,motion forecasting,Motion forecasting is the task of predicting the location of a tracked object in the future +computer-vision,video based workflow recognition, +computer-vision,egocentric activity recognition, +computer-vision,document layout analysis,"""**Document Layout Analysis** is performed to determine physical structure of a document, that is, to determine document components. These document components can consist of single connected components-regions [...] of +pixels that are adjacent to form single regions [...] , or group +of text lines. A text line is a group of characters, symbols, +and words that are adjacent, “relatively close” to each other +and through which a straight line can be drawn (usually with +horizontal or vertical orientation)."" L. O'Gorman, ""The document spectrum for page layout analysis,"" in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, no. 11, pp. 1162-1173, Nov. 1993. + +Image credit: [PubLayNet: largest dataset ever for document layout analysis](https://arxiv.org/pdf/1908.07836v1.pdf)" +computer-vision,cross view person re identification, +computer-vision,unbalanced segmentation, +computer-vision,gaze redirection, +computer-vision,fashion understanding, +computer-vision,underwater image restoration,Underwater image restoration aims to rectify the distorted colors and present the true colors of the underwater scene. +computer-vision,object super resolution,"3D object super-resolution is the task of up-sampling 3D objects. + +( Image credit: [Multi-View Silhouette and Depth Decomposition for High Resolution 3D Object Representation](https://github.com/EdwardSmith1884/Multi-View-Silhouette-and-Depth-Decomposition-for-High-Resolution-3D-Object-Representation) )" +computer-vision,human parsing,"Human parsing is the task of segmenting a human image into different fine-grained semantic parts such as head, torso, arms and legs. + +( Image credit: [Multi-Human-Parsing (MHP) +](https://github.com/ZhaoJ9014/Multi-Human-Parsing) )" +computer-vision,face anti spoofing,"Facial anti-spoofing is the task of preventing false facial verification by using a photo, video, mask or a different substitute for an authorized person’s face. Some examples of attacks: + +- **Print attack**: The attacker uses someone’s photo. The image is printed or displayed on a digital device. + +- **Replay/video attack**: A more sophisticated way to trick the system, which usually requires a looped video of a victim’s face. This approach ensures behaviour and facial movements to look more ‘natural’ compared to holding someone’s photo. + +- **3D mask attack**: During this type of attack, a mask is used as the tool of choice for spoofing. It’s an even more sophisticated attack than playing a face video. In addition to natural facial movements, it enables ways to deceive some extra layers of protection such as depth sensors. + + +( Image credit: [Learning Generalizable and Identity-Discriminative Representations for Face Anti-Spoofing](https://github.com/XgTu/GFA-CNN) )" +computer-vision,unsupervised video summarization,"**Unsupervised video summarization** approaches overcome the need for ground-truth data (whose production requires time-demanding and laborious manual annotation procedures), based on learning mechanisms that require only an adequately large collection of original videos for their training. Specifically, the training is based on heuristic rules, like the sparsity, the representativeness, and the diversity of the utilized input features/characteristics." +computer-vision,shape from texture, +computer-vision,line art colorization, +computer-vision,rain removal, +computer-vision,metric learning,"The goal of **Metric Learning** is to learn a representation function that maps objects into an embedded space. The distance in the embedded space should preserve the objects’ similarity — similar objects get close and dissimilar objects get far away. Various loss functions have been developed for Metric Learning. For example, the **contrastive loss** guides the objects from the same class to be mapped to the same point and those from different classes to be mapped to different points whose distances are larger than a margin. **Triplet loss** is also popular, which requires the distance between the anchor sample and the positive sample to be smaller than the distance between the anchor sample and the negative sample. + + +Source: [Road Network Metric Learning for Estimated Time of Arrival ](https://arxiv.org/abs/2006.13477)" +computer-vision,body detection,Detection of the persons or the characters defined in the dataset. +computer-vision,unsupervised semantic segmentation with,A segmentation task which does not utilise any human-level supervision for semantic segmentation except for a backbone which is initialised with features pre-trained with image-level labels. +computer-vision,artist classification,Classification of the artist for artistic images +computer-vision,optical character recognition,"Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)" +computer-vision,shadow detection and removal, +computer-vision,panoptic scene graph generation,"PSG task abstracts the given image with a scene graph, where nodes are grounded by panoptic segmentation" +computer-vision,motion synthesis,"Image source: [Multi-View Motion Synthesis via Applying Rotated Dual-Pixel Blur Kernels +](https://paperswithcode.com/paper/multi-view-motion-synthesis-via-applying)" +computer-vision,natural image orientation angle detection,"Image orientation angle detection is a pretty challenging task for a machine because the machine has to learn the features of an image in such a way so that it can detect the arbitrary angle by which the image is rotated. Though there are some modern cameras with features involving inertial sensors that can correct image orientation in steps of 90 degrees, those features are seldom used. In this paper, we propose a method to detect the orientation angle of a digitally captured image where the image may have been captured by a camera at a tilted angle (between 0\degree to 359\degree)." +computer-vision,image inpainting,"**Image Inpainting** is a task of reconstructing missing regions in an image. It is an important problem in computer vision and an essential functionality in many imaging and graphics applications, e.g. object removal, image restoration, manipulation, re-targeting, compositing, and image-based rendering. + +Source: [High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling ](https://arxiv.org/abs/2005.11742) + +Image source: [High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling](https://arxiv.org/pdf/2005.11742.pdf)" +computer-vision,multiview gait recognition, +computer-vision,object recognition,"Object recognition is a computer vision technique for detecting + classifying objects in images or videos. Since this is a combined task of object detection plus image classification, the state-of-the-art tables are recorded for each component task [here](https://www.paperswithcode.com/task/object-detection) and [here](https://www.paperswithcode.com/task/image-classification2). + +( Image credit: [Tensorflow Object Detection API +](https://github.com/tensorflow/models/tree/master/research/object_detection) )" +computer-vision,gender prediction, +computer-vision,spatio temporal video grounding, +computer-vision,head pose estimation,"Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. + +( Image credit: [FSA-Net: Learning Fine-Grained Structure Aggregation for Head Pose +Estimation from a Single Image](http://openaccess.thecvf.com/content_CVPR_2019/papers/Yang_FSA-Net_Learning_Fine-Grained_Structure_Aggregation_for_Head_Pose_Estimation_From_CVPR_2019_paper.pdf) )" +computer-vision,video forensics, +computer-vision,point supervised instance segmentation,Weakly-Supervised Instance Segmentation using Point Labels +computer-vision,rgb t tracking, +computer-vision,scene change detection,"Scene change detection (SCD) refers to the task of localizing changes and identifying change-categories given two scenes. A scene can be either an RGB (+D) image or a 3D reconstruction (point cloud). If the scene is an image, SCD is a form of pixel-level prediction because each pixel in the image is classified according to a category. On the other hand, if the scene is point cloud, SCD is a form of point-level prediction because each point in the cloud is classified according to a category. + +Some example benchmarks for this task are VL-CMU-CD, PCD, and CD2014. Recently, more complicated benchmarks such as ChangeSim, HDMap, and Mallscape are released. + +Models are usually evaluated with the Mean Intersection-Over-Union (Mean IoU), Pixel Accuracy, or F1 metrics." +computer-vision,few shot temporal action localization,Detect Action using few labeled samples +computer-vision,action generation, +computer-vision,video synchronization, +computer-vision,landslide segmentation, +computer-vision,prediction of occupancy grid maps, +computer-vision,intelligent surveillance, +computer-vision,continuous object recognition,"Continuous object recognition is the task of performing object recognition on a data stream and learning continuously, trying to mitigate issues such as catastrophic forgetting. + +( Image credit: [CORe50 dataset](https://vlomonaco.github.io/core50/) )" +computer-vision,image stitching,"**Image Stitching** is a process of composing multiple images with narrow but overlapping fields of view to create a larger image with a wider field of view. + + +Source: [Single-Perspective Warps in Natural Image Stitching ](https://arxiv.org/abs/1802.04645) + +( Image credit: [Kornia](https://github.com/kornia/kornia) )" +computer-vision,visual localization,"**Visual Localization** is the problem of estimating the camera pose of a given image relative to a visual representation of a known scene. + + +Source: [Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization ](https://arxiv.org/abs/1908.06387)" +computer-vision,crack segmentation, +computer-vision,unsupervised image decomposition, +computer-vision,object reconstruction, +computer-vision,single object discovery, +computer-vision,semi supervised object detection, +computer-vision,style transfer,"Style transfer is the task of changing the style of an image in one domain to the style of an image in another domain. + +( Image credit: [A Neural Algorithm of Artistic Style](https://arxiv.org/pdf/1508.06576v2.pdf) )" +computer-vision,hand joint reconstruction, +computer-vision,damaged building detection, +computer-vision,matching disparate images, +computer-vision,semantic correspondence,The task of semantic correspondence aims to establish reliable visual correspondence between different instances of the same object category. +computer-vision,population mapping, +computer-vision,text to face generation, +computer-vision,aesthetic image captioning, +computer-vision,human mesh recovery,Estimate 3D body mesh from images +computer-vision,gait recognition in the wild,"Gait Recognition in the Wild refers to methods under real-world senses, i.e., unconstrained environment." +computer-vision,object detection,"2D object detection classifies the object category and estimates oriented 2D bounding boxes of physical objects from 3D sensor data. + +( Image credit: [AVOD](https://github.com/kujason/avod) )" +computer-vision,visual recognition, +computer-vision,object reconstruction,Image: [Choy et al](https://arxiv.org/pdf/1604.00449v1.pdf) +computer-vision,image declipping, +computer-vision,jpeg artifact correction,"Correction of visual artifacts caused by JPEG compression, these artifacts are usually grouped into three types: blocking, blurring, and ringing. They are caused by quantization and removal of high frequency DCT coefficients." +computer-vision,facial expression generation, +computer-vision,self driving cars,"Self-driving cars : the task of making a car that can drive itself without human guidance. + +( Image credit: [Learning a Driving Simulator](https://github.com/commaai/research) )" +computer-vision,multispectral object detection, +computer-vision,motion disentanglement,Disentangling irregular (anomalous) motion from regular motion. +computer-vision,multi object tracking,Image: [Weng et al](https://arxiv.org/pdf/1907.03961v4.pdf) +computer-vision,reconstruction, +computer-vision,depth image estimation, +computer-vision,covid 19 image segmentation, +computer-vision,visual dialogue,"Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question." +computer-vision,key frame based video super resolution k 15,"Key-Frame-based Video Super-Resolution is a sub-task of [Video Super-Resolution](https://paperswithcode.com/task/video-super-resolution), where, in addition to the low-resolution frames, high-resolution ground-truth frames for every Kth input frame are also provided as inputs to the model. For example, if `[LR-frame-1, LR-frame-2, LR-frame-3, ..., LR-frame-100]` is the sequence of low-resolution frames to be upscaled, the Key-Frame-based Video Super-Resolution (K = 15) model is also provided with the high-resolution frames `[HR-frame-1, HR-frame-16, ..., HR-frame-91]` . Key-frames are excluded when measuring the evaluation metrics." +computer-vision,depiction invariant object recognition,"Depiction invariant object recognition is the task of recognising objects irrespective of how they are visually depicted (line drawing, realistic shaded drawing, photograph etc.). + +( Image credit: [SwiDeN](https://arxiv.org/pdf/1607.08764v1.pdf) )" +computer-vision,one shot instance segmentation,"( Image credit: [Siamese Mask R-CNN +](https://github.com/bethgelab/siamese-mask-rcnn) )" +computer-vision,visual question answering,"**Visual Question Answering** is a semantic task that aims to answer questions based on an image. + +Image Source: [visualqa.org](https://visualqa.org/)" +computer-vision,contrastive learning, +computer-vision,saliency detection,"**Saliency Detection** is a preprocessing step in computer vision which aims at finding salient objects in an image. + + +Source: [An Unsupervised Game-Theoretic Approach to Saliency Detection ](https://arxiv.org/abs/1708.02476)" +computer-vision,image dehazing,"( Image credit: [Densely Connected Pyramid Dehazing Network](https://github.com/hezhangsprinter/DCPDN) )" +computer-vision,image steganography,"**Image Steganography** is the main content of information hiding. The sender conceal a secret message into a cover image, then get the container image called stego, and finish the secret message’s transmission on the public channel by transferring the stego image. Then the receiver part of the transmission can reveal the secret message out. Steganalysis is an attack to the steganography algorithm. The listener on the public channel intercept the image and analyze whether the image contains secret information. + + +Source: [Invisible Steganography via Generative Adversarial Networks ](https://arxiv.org/abs/1807.08571)" +computer-vision,text to image, +computer-vision,point cloud completion, +computer-vision,temporal defect localization,"Closed-Circuit TeleVision (CCTV) is popular method for pipe defect +inspection. Different from short QV videos, CCTV videos are much longer and record more comprehensive content in the very distant pipe. The main task is to discover temporal locations of pipe defects in such untrimmed videos. Clearly, manual inspection is expensive, based on hundreds of hours of CCTV videos. To fill this gap, we introduce this temporal localization task, which is to find the temporal locations of pipe detects and recognizing their corresponding categories in +a long CCTV video." +computer-vision,federated lifelong person reid, +computer-vision,kinematic based workflow recognition, +computer-vision,real time semantic segmentation, +computer-vision,image cropping,"**Image Cropping** is a common photo manipulation process, which improves the overall composition by removing unwanted regions. Image Cropping is widely used in photographic, film processing, graphic design, and printing businesses. + + +Source: [Listwise View Ranking for Image Cropping ](https://arxiv.org/abs/1905.05352)" +computer-vision,animal pose estimation,"Animal pose estimation is the task of identifying the pose of an animal. + +( Image credit: [Using DeepLabCut for 3D markerless pose estimation across species and behaviors](http://www.mousemotorlab.org/s/NathMathis2019.pdf) )" +computer-vision,holdout set, +computer-vision,semi supervised learning for image captioning, +computer-vision,denoising,"Denoising is the task of removing noise from an image. + +( Image credit: [Beyond a Gaussian Denoiser](https://arxiv.org/pdf/1608.03981v1.pdf) )" +computer-vision,lane detection, +computer-vision,referring image matting prompt based,"Prompt-based referring image matting, taking an image and a prompt word as the input." +computer-vision,moment retrieval,"Moment retrieval can de defined as the task of ""localizing moments in a video given a user query"". + +Description from: [QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries](https://arxiv.org/pdf/2107.09609v1.pdf) + +Image credit: [QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries](https://arxiv.org/pdf/2107.09609v1.pdf)" +computer-vision,video similarity, +computer-vision,occluded object symmetry detection, +computer-vision,shape modeling,Image: [Gkioxari et al](https://arxiv.org/pdf/1906.02739v2.pdf) +computer-vision,fake image attribution,Attribute the origin (model/architecture) of fake images. +computer-vision,scene aware dialogue, +computer-vision,point interactive image colorization,"__Point-interactive colorization__ is a task of colorizing images given user-guided clicks containing colors (a.k.a color hints). +Unlike unconditional image colorization, which is an underdetermined problem by nature, point-interactive colorization aims to generate images containing specific colors given by the user. + +Point-interactive colorization is evaluated by providing simulated user hints from the groundtruth color image. +Following the [iColoriT protocol](https://arxiv.org/abs/2207.06831), user hints have a size of 2x2 pixels and color is given as the average color within the 2x2 pixels." +computer-vision,localization in video forgery, +computer-vision,age and gender classification,"Age and gender classification is a dual-task of identifying the age and gender of a person from an image or video. + +( Image credit: [Multi-Expert Gender Classification on Age Group by Integrating Deep Neural Networks](https://arxiv.org/pdf/1809.01990v2.pdf) )" +computer-vision,multi object discovery, +computer-vision,animal action recognition,"Cross-species (intra-class, inter-class) action recognition" +computer-vision,amodal panoptic segmentation,The goal of this task is to simultaneously predict the pixel-wise semantic segmentation labels of the visible regions of stuff classes and the instance segmentation labels of both the visible and occluded regions of thing classes. +computer-vision,demosaicking,"Most modern digital cameras acquire color images by measuring only one color channel per pixel, red, green, or blue, according to a specific pattern called the Bayer pattern. **Demosaicking** is the processing step that reconstruct a full color image given these incomplete measurements. + + +Source: [Revisiting Non Local Sparse Models for Image Restoration ](https://arxiv.org/abs/1912.02456)" +computer-vision,cryogenic electron microscopy cryo em,"Analysis of images and videos from transmission electron microscopes, including single-particle cryogenic electron microscopy and cryogenic electron tomography (cryo-ET). +https://en.wikipedia.org/wiki/Cryogenic_electron_microscopy" +computer-vision,scene segmentation,"Scene segmentation is the task of splitting a scene into its various object components. + +Image adapted from [Temporally coherent 4D reconstruction of complex dynamic scenes](https://paperswithcode.com/paper/temporally-coherent-4d-reconstruction-of2)." +computer-vision,salient object detection 1, +computer-vision,unsupervised few shot learning,"In contrast to supervised few-shot learning, only the unlabeled dataset is available in the pre-training +or meta-training stage for unsupervised few-shot learning." +computer-vision,membership inference attack, +computer-vision,image deconvolution, +computer-vision,image retouching, +computer-vision,license plate detection,License Plate Recognition is an image-processing technology used to identify vehicles by their license plates. This technology is used in various security and traffic applications. +computer-vision,svbrdf estimation,SVBRDF Estimation +computer-vision,person identification, +computer-vision,lip to speech synthesis,"Given a silent video of a speaker, generate the corresponding speech that matches the lip movements." +computer-vision,photo geolocation estimation,**Photo geolocation estimation** is task of estimate or classify the geolocation from photos on world map. +computer-vision,video to shop, +computer-vision,classifier calibration,Confidence calibration – the problem of predicting probability estimates representative of the true correctness likelihood – is important for classification models in many applications. The two common calibration metrics are Expected Calibration Error (ECE) and Maximum Calibration Error (MCE). +computer-vision,story visualization, +computer-vision,real time semantic segmentation,"Real-time semantic segmentation is the task of achieving computationally efficient semantic segmentation (while maintaining a base level of accuracy). + +( Image credit: [TorchSeg](https://github.com/ycszen/TorchSeg) )" +computer-vision,opd single view openable part detection,Detect the openable parts and predict their motion parameters from single-view image +computer-vision,occlusion handling, +computer-vision,infrared image super resolution,Aims at upsampling the IR image and create the high resolution image with help of a low resolution image. +computer-vision,surface normals estimation from point clouds,Parent task: 3d Point Clouds Analysis +computer-vision,face modeling, +computer-vision,constrained diffeomorphic image registration, +computer-vision,video stabilization, +computer-vision,deep feature inversion, +computer-vision,object classification,"3D Object Classification is the task of predicting the class of a 3D object point cloud. It is a voxel level prediction where each voxel is classified into a category. The popular benchmark for this task is the ModelNet dataset. The models for this task are usually evaluated with the Classification Accuracy metric. + +Image: [Sedaghat et al](https://arxiv.org/pdf/1604.03351v2.pdf)" +computer-vision,activity detection,Detecting activities in extended videos. +computer-vision,few shot image segmentation,Few-shot semantic segmentation (FSS) learns to segment target objects in query image given few pixel-wise annotated support image. +computer-vision,stereo matching 1,"**Stereo Matching** is one of the core technologies in computer vision, which recovers 3D structures of real world from 2D images. It has been widely used in areas such as autonomous driving, augmented reality and robotics navigation. Given a pair of rectified stereo images, the goal of Stereo Matching is to compute the disparity for each pixel in the reference image, where disparity is defined as the horizontal displacement between a pair of corresponding pixels in the left and right images. + + +Source: [Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching ](https://arxiv.org/abs/1909.03751)" +computer-vision,gallbladder cancer detection, +computer-vision,video quality assessment, +computer-vision,aggregate xview3 metric,"The aggregate xView3 metric is the combination of five metrics: object detection F1 score, close-to-shore object detection F1 score, vessel/not vessel classification F1 score, fishing/not fishing classification F1 score, and vessel length estimation percent error regression." +computer-vision,motion retargeting, +computer-vision,color image compression artifact reduction, +computer-vision,document image classification,"Document image classification is the task of classifying documents based on images of their contents. + +( Image credit: [Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines](https://arxiv.org/pdf/1711.05862v1.pdf) )" +computer-vision,action spotting, +computer-vision,unsupervised video clustering, +computer-vision,flare removal,"When a camera is pointed at a strong light source, the resulting photograph may contain lens flare artifacts. Flares appear in a wide variety of patterns (halos, streaks, color bleeding, haze, etc.) and this diversity in appearance makes flare removal challenging." +computer-vision,code search,"The goal of **Code Search** is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language. + + +Source: [When Deep Learning Met Code Search ](https://arxiv.org/abs/1905.03813)" +computer-vision,sperm morphology classification,Multi-class classification of sperm head morphology. +computer-vision,object counting,"The goal of **Object Counting** task is to count the number of object instances in a single image or video sequence. It has many real-world applications such as traffic flow monitoring, crowdedness estimation, and product counting. + + +Source: [Learning to Count Objects with Few Exemplar Annotations ](https://arxiv.org/abs/1905.07898)" +computer-vision,camera relocalization,"""Camera relocalization, or image-based localization is a fundamental problem in robotics and computer vision. It refers to the process of determining camera pose from the visual scene representation and it is essential for many applications such as navigation of autonomous vehicles, structure from motion (SfM), augmented reality (AR) and simultaneous localization and mapping (SLAM)."" ([Source](https://paperswithcode.com/paper/camera-relocalization-by-computing-pairwise))" +computer-vision,occlusion estimation, +computer-vision,robust face alignment,"Robust face alignment is the task of face alignment in unconstrained (non-artificial) conditions. + +( Image credit: [Deep Alignment Network](https://github.com/MarekKowalski/DeepAlignmentNetwork) )" +computer-vision,face reenactment,"**Face Reenactment** is an emerging conditional face synthesis task that aims at fulfilling two goals simultaneously: 1) transfer a source face shape to a target face; while 2) preserve the appearance and the identity of the target face. + + +Source: [One-shot Face Reenactment ](https://arxiv.org/abs/1908.03251)" +computer-vision,replay grounding,"Replay grounding is introduced in SoccerNet-v2 in the case of videos of soccer games. Given a replay shot of a soccer action, the objective is to retrieve when said action occurs within the whole live game." +computer-vision,person re identification,"Person re-identification is the task of associating images of the same person taken from different cameras or from the same camera in different occasions. + +( Image credit: [PRID2011 dataset](https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/PRID11/) )" +computer-vision,unsupervised object segmentation,Image credit: [ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation](https://paperswithcode.com/paper/clevrtex-a-texture-rich-benchmark-for) +computer-vision,persuasion strategies,Prediction of Persuasion Strategy in Advertisements +computer-vision,image generation from scene graphs, +computer-vision,low light pedestrian detection,Pedestrian Detection in low-light condition +computer-vision,adversarial attack detection,The detection of adversarial attacks. +computer-vision,gaze target estimation,Gaze Target Estimation refers to predicting the image 2D gaze location of a person in the image. +computer-vision,image quality estimation, +computer-vision,human pose and shape estimation,Estimate 3D human pose and shape (e.g. SMPL) from images +computer-vision,face detection,"Face detection is the task of detecting faces in a photo or video (and distinguishing them from other objects). + +( Image credit: [insightface](https://github.com/deepinsight/insightface) )" +computer-vision,sketch, +computer-vision,street scene parsing, +computer-vision,language based temporal localization, +computer-vision,image classification with dp,**Image Classification with Differential Privacy** is an improved version of the image classification task whereby the final classification output only describe the patterns of groups within the dataset while withholding information about individuals in the dataset. +computer-vision,deep attention, +computer-vision,point cloud pre training, +computer-vision,lighting estimation,Lighting Estimation analyzes given images to provide detailed information about the lighting in a scene. +computer-vision,thermal infrared pedestrian detection,Thermal Infrared Pedestrian Detection under low-light condition +computer-vision,stereo matching, +computer-vision,bokeh effect rendering, +computer-vision,active object localization, +computer-vision,image deep networks, +computer-vision,character animation from a single photo,Image: [Weng et al](https://arxiv.org/pdf/1812.02246v1.pdf) +computer-vision,human object interaction concept discovery,"Discovering the reasonable HOI concepts/categories from known categories and their instances. Actually, it is also a matrix (verb-object matrix) complementation problem." +computer-vision,mri reconstruction,"In its most basic form, MRI reconstruction consists in retrieving a complex-valued image from its under-sampled Fourier coefficients. +Besides, it can be addressed as a encoder-decoder task, in which the normative model in the latent space will only capture the relevant information without noise or corruptions. Then, we decode the latent space in order to have a reconstructed MRI." +computer-vision,incomplete multi view clustering, +computer-vision,pose estimation 1,Image: [Zeng et al](https://arxiv.org/pdf/1609.09475v3.pdf) +computer-vision,blind image quality assessment, +computer-vision,human action generation,"Yan et al. (2019) CSGN: + +""When the dancer is stepping, jumping and spinning on the +stage, attentions of all audiences are attracted by the streamof the fluent and graceful movements. Building a model that is capable of dancing is as fascinating a task as appreciating the performance itself. In this paper, we aim to generate long-duration human actions represented as skeleton sequences, e.g. those that cover the entirety of a dance, with hundreds of moves and countless possible combinations."" + + +( Image credit: [Convolutional Sequence Generation for Skeleton-Based Action Synthesis](http://www.dahualin.org/publications/dhl19_csgn.pdf) )" +computer-vision,bird view synthesis, +computer-vision,imagedocument clustering, +computer-vision,vehicle speed estimation,Vehicle speed estimation is the task of detecting and tracking vehicles whose real-world speeds are then estimated. The task is usually evaluated with recall and precision of the detected vehicle tracks as well as the mean or median errors of the estimated vehicle speeds. +computer-vision,semi supervised semantic segmentation, +computer-vision,group detection in crowds, +computer-vision,rice grain disease detection, +computer-vision,partially view aligned multi view learning,"In multi-view learning, Partially View-aligned Problem (PVP) refers to the case when only a portion of data is aligned, thus leading to data inconsistency." +computer-vision,transparency separation, +computer-vision,human dynamics, +computer-vision,house generation, +computer-vision,future prediction, +computer-vision,vehicle pose estimation,"Image Credit: [GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision, ECCV'20](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600511.pdf)" +computer-vision,drivable area detection, +computer-vision,person search,"**Person Search** is a task which aims at matching a specific person among a great number of whole scene images. + + +Source: [Re-ID Driven Localization Refinement for Person Search ](https://arxiv.org/abs/1909.08580)" +computer-vision,occluded face detection, +computer-vision,semi supervised sketch based image retrieval,"Whilst the number of photos can be easily scaled, each corresponding sketch still needs to be individually produced for fine-grained sketch-based image retrieval. The objective is to mitigate such an upper-bound on sketch data, and study whether unlabelled photos alone (of which they are many) can be cultivated for performance gain." +computer-vision,disjoint 10 1, +computer-vision,face transfer,"**Face Transfer** is a method for mapping face performances of one individual to facial animations of another one. It uses facial expressions and head poses from the video of a source actor to generate a video of a target character. Face Transfer is a special case of image-to-image translation tasks. + + +Source: [Face Transfer with Generative Adversarial Network ](https://arxiv.org/abs/1710.06090)" +computer-vision,face swapping,"Face swapping refers to the task of swapping faces between images or in an video, while maintaining the rest of the body and environment context. + +( Image credit: [Swapped Face Detection using Deep Learning and Subjective Assessment](https://arxiv.org/pdf/1909.04217v1.pdf) )" +computer-vision,self knowledge distillation, +computer-vision,interspecies facial keypoint transfer,Find cross-domain semantic correspondence between faces of different species +computer-vision,shadow detection, +computer-vision,handwritten word generation, +computer-vision,multi target domain adaptation,The idea of Multi-target Domain Adaptation is to adapt a model from a single labelled source domain to multiple unlabelled target domains. +computer-vision,indoor monocular depth estimation, +computer-vision,pulmonary arteryvein classification, +computer-vision,video generation,"( Image credit: [Logacheva et al.](https://paperswithcode.com/paper/deeplandscape-adversarial-modeling-of-1) )" +computer-vision,activity prediction,Predict human activities in videos +computer-vision,relational captioning, +computer-vision,face generation,"Face generation is the task of generating (or interpolating) new faces from an existing dataset. + +The state-of-the-art results for this task are located in the Image Generation parent. + +( Image credit: [Progressive Growing of GANs for Improved Quality, Stability, and Variation +](https://arxiv.org/pdf/1710.10196v3.pdf) )" +computer-vision,medical image denoising,Image credit: [Learning Medical Image Denoising with Deep Dynamic Residual Attention Network](https://paperswithcode.com/paper/learning-medical-image-denoising-with-deep) +computer-vision,text to image generation,This task refers to image generation based on a given sentence or sequence of words. +computer-vision,audio visual active speaker detection,Determine if and when each visible person in the video is speaking. +computer-vision,generalizable person re identification,Generalizable person re-identification refers to methods trained on a source dataset but directly evaluated on a target dataset without domain adaptation or transfer learning. +computer-vision,fake image detection,"( Image credit: [FaceForensics++](https://github.com/ondyari/FaceForensics) )" +computer-vision,video compressive sensing, +computer-vision,affordance detection,"Affordance detection refers to identifying the potential action possibilities of objects in an image, which is an important ability for robot perception and manipulation. + +Image source: [Object-Based Affordances Detection with Convolutional Neural Networks and Dense Conditional Random Fields](https://dkanou.github.io/publ/P15__Nguyen_Kanoulas_Caldwell_Tsagarakis__2017__Object-Based_Affordances_Detection_with_Convolutional_Neural_Networks_and__Dense_Conditional_Random_Fields.pdf) + +Unlike other visual or physical properties that mainly describe the object alone, affordances indicate functional interactions of object parts with humans." +computer-vision,hand keypoint localization, +computer-vision,autonomous vehicles,"Autonomous vehicles is the task of making a vehicle that can guide itself without human conduction. + +Many of the state-of-the-art results can be found at more general task pages such as [3D Object Detection](https://paperswithcode.com/task/3d-object-detection) and [Semantic Segmentation](https://paperswithcode.com/task/semantic-segmentation). + +( Image credit: [GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision](https://arxiv.org/abs/2007.13124) )" +computer-vision,disjoint 15 5, +computer-vision,image reconstruction, +computer-vision,hyperview challenge,"The objective of this challenge is to advance the state of the art for soil parameter retrieval from hyperspectral data in view of the upcoming Intuition-1 mission. A campaign took place in March 2021 over agricultural areas in Poland with extensive ground samplings collocated with airborne hyperspectral measurements from imagers mounted onboard an airplane. The hyperspectral data contains 150 contiguous hyperspectral bands (462-942 nm, with a spectral resolution of 3.2 nm), which reflects the spectral range of the hyperspectral imaging sensor deployed on-board Intuition-1." +computer-vision,video style transfer, +computer-vision,open world object detection,"Open World Object Detection is a computer vision problem where a model is tasked to: 1) identify objects that have not been introduced to it as `unknown', without explicit supervision to do so, and 2) incrementally learn these identified unknown categories without forgetting previously learned classes, when the corresponding labels are progressively received." +computer-vision,template matching, +computer-vision,material recognition, +computer-vision,face anonymization, +computer-vision,pose estimation using rgbd,Image: [Zeng et al](https://arxiv.org/pdf/1609.09475v3.pdf) +computer-vision,visual keyword spotting,Spot a given query keyword in a silent talking face video +computer-vision,video matting,Image credit: [https://arxiv.org/pdf/2012.07810v1.pdf](https://arxiv.org/pdf/2012.07810v1.pdf) +computer-vision,traffic sign recognition,"Traffic sign recognition is the task of recognising traffic signs in an image or video. + +( Image credit: [Novel Deep Learning Model for Traffic Sign Detection Using Capsule +Networks ](https://arxiv.org/pdf/1805.04424v1.pdf) )" +computer-vision,neural stylization, +computer-vision,super resolution,"Super resolution is the task of taking an input of a low resolution (LR) and upscaling it to that of a high resolution. + +You can find relevant leaderboards in the subtasks below. + +( Credit: [MemNet](https://github.com/tyshiwo/MemNet) )" +computer-vision,image matting,"**Image Matting** is the process of accurately estimating the foreground object in images and videos. It is a very important technique in image and video editing applications, particularly in film production for creating visual effects. In case of image segmentation, we segment the image into foreground and background by labeling the pixels. Image segmentation generates a binary image, in which a pixel either belongs to foreground or background. However, Image Matting is different from the image segmentation, wherein some pixels may belong to foreground as well as background, such pixels are called partial or mixed pixels. In order to fully separate the foreground from the background in an image, accurate estimation of the alpha values for partial or mixed pixels is necessary. + + +Source: [Automatic Trimap Generation for Image Matting ](https://arxiv.org/abs/1707.00333) + +Image Source: [Real-Time High-Resolution Background Matting](https://arxiv.org/pdf/2012.07810v1.pdf)" +computer-vision,, +computer-vision,unsupervised few shot image classification,"In contrast to (supervised) few-shot image classification, only the unlabeled dataset is available in the pre-training or meta-training stage for unsupervised few-shot image classification." +computer-vision,single view reconstruction, +computer-vision,active observation completion, +graphs,anchor link prediction, +graphs,set to graph prediction 1, +graphs,role embedding, +graphs,knowledge base completion,"Knowledge base completion is the task which automatically infers missing facts by reasoning about the information already present in the knowledge base. A knowledge base is a collection of relational +facts, often represented in the form of ""subject"", ""relation"", ""object""-triples." +graphs,community search, +graphs,hypergraph embedding,Compute useful representations of hyperedges and vertices +graphs,root cause ranking,Detection of causal anomalous nodes in graphs +graphs,hypergraph matching, +graphs,graph partitioning,Graph Partitioning is generally the first step of distributed graph computing tasks. The targets are load-balance and minimizing the communication volume. +graphs,link prediction,"Link prediction is a task to estimate the probability of links between nodes in a graph. + +( Image credit: [Inductive Representation Learning on Large Graphs](https://arxiv.org/pdf/1706.02216v4.pdf) )" +graphs,topological data analysis, +graphs,triad prediction, +graphs,node classification,"The node classification task is one where the algorithm has to determine the labelling of samples (represented as nodes) by looking at the labels of their neighbours. + +Node classification models aim to predict non-existing node properties (known as the target propert) based on other node properties. Typical models used for node classification consists of a large family of graph neural networks. Model performance can be measured using benchmark datasets like [Cora](/dataset/cora), [Citeseer](/dataset/citeseer), and [Pubmed](/dataset/pubmed), among others, typically using Accuracy and F1. + +( Image credit: [Fast Graph Representation Learning With PyTorch Geometric](https://arxiv.org/pdf/1903.02428v3.pdf) )" +graphs,local community detection, +graphs,graph ranking, +graphs,network community partition, +graphs,graph construction, +graphs,jet tagging,"Jet tagging is the process of identifying the type of elementary particle that initiates a ""jet"", i.e., a collimated spray of outgoing particles. It is essentially a classification task that aims to distinguish jets arising from particles of interest, such as the Higgs boson or the top quark, from other less interesting types of jets." +graphs,inductive link prediction,"In inductive link prediction inference is performed on a new, unseen graph whereas classical transductive link prediction performs both training and inference on the same graph." +graphs,ancestor descendant prediction,"Given two entities, make a binary prediction if they have ancestor-descendant relationship, based on existing and missing hierarchical edges in the graph." +graphs,hand pose estimation,Image: [Zimmerman et l](https://arxiv.xsrg/pdf/1705.01389v3.pdf) +graphs,knowledge graph embeddings, +graphs,graph similarity, +graphs,graph attention, +graphs,video inpainting,"The goal of **Video Inpainting** is to fill in missing regions of a given video sequence with contents that are both spatially and temporally coherent. Video Inpainting, also known as video completion, has many real-world applications such as undesired object removal and video restoration. + + +Source: [Deep Flow-Guided Video Inpainting ](https://arxiv.org/abs/1905.02884)" +graphs,nmr j coupling,https://github.com/larsbratholm/champs_kaggle +graphs,approximating betweenness centrality ranking,Betweenness-centrality is a popular measure in network analysis that aims to describe the importance of nodes in a graph. It accounts for the fraction of shortest paths passing through that node and is a key measure in many applications including community detection and network dismantling. +graphs,rubik s cube,Solving the Rubik's Cube is a pathfinding task on a massive implicit graph. +graphs,dynamic link prediction, +graphs,semantic segmentation,"Semantic segmentation, or image segmentation, is the task of clustering parts of an image together which belong to the same object class. It is a form of pixel-level prediction because each pixel in an image is classified according to a category. Some example benchmarks for this task are Cityscapes, PASCAL VOC and ADE20K. Models are usually evaluated with the Mean Intersection-Over-Union (Mean IoU) and Pixel Accuracy metrics. + +( Image credit: [CSAILVision](https://github.com/CSAILVision/semantic-segmentation-pytorch) )" +graphs,online community detection, +graphs,collaborative ranking, +graphs,graphon estimation, +graphs,graph reconstruction, +graphs,link property prediction, +graphs,community detection,"**Community Detection** is one of the fundamental problems in network analysis, where the goal is to find groups of nodes that are, in some sense, more similar to each other than to the other nodes. + + +Source: [Randomized Spectral Clustering in Large-Scale Stochastic Block Models ](https://arxiv.org/abs/2002.00839)" +graphs,knowledge graph embedding, +graphs,graph nonvolutional network, +graphs,inductive relation prediction,Inductive setting of the knowledge graph completion task. This requires a model to perform link prediction on an entirely new test graph with new set of entities. +graphs,spectral graph clustering, +graphs,graph sampling,Training GNNs or generating graph embeddings requires graph samples. +graphs,hyperedge prediction, +graphs,gene interaction prediction, +graphs,image outpainting,"Predicting the visual context of an image beyond its boundary. + +Image credit: [NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis](https://paperswithcode.com/paper/nuwa-infinity-autoregressive-over?from=n35)" +graphs,graph matching,"**Graph Matching** is the problem of finding correspondences between two sets of vertices while preserving complex relational information among them. Since the graph structure has a strong capacity to represent objects and robustness to severe deformation and outliers, it is frequently adopted to formulate various correspondence problems in the field of computer vision. Theoretically, the Graph Matching problem can be solved by exhaustively searching the entire solution space. However, this approach is infeasible in practice because the solution space expands exponentially as the size of input data increases. For that reason, previous studies have attempted to solve the problem by using various approximation techniques. + + +Source: [Consistent Multiple Graph Matching with Multi-layer Random Walks Synchronization ](https://arxiv.org/abs/1712.02575)" +graphs,graph regression,The regression task is similar to graph classification but using different loss function and performance metric. +graphs,link sign prediction, +graphs,tree map layout,Hierarchical tree visualisation by assigning sizes and positions of nodes. https://en.wikipedia.org/wiki/Treemapping +graphs,structural node embedding, +graphs,graph question answering, +graphs,dynamic community detection,community detection in dynamic networks +graphs,graph property prediction, +graphs,graph outlier detection, +graphs,heterogeneous node classification,"Node classification in heterogeneous graphs, where nodes and/or edges have multiple types." +graphs,learning to rank,"Learning to rank is the application of machine learning to build ranking models. Some common use cases for ranking models are information retrieval (e.g., web search) and news feeds application (think Twitter, Facebook, Instagram)." +graphs,graph learning, +graphs,graph mining, +graphs,graph structure learning,Semi-supervised node classification when a graph structure is not available. +graphs,graph embedding,"Graph embeddings learn a mapping from a network to a vector space, while preserving relevant network properties. + +( Image credit: [GAT](https://github.com/PetarV-/GAT) )" +graphs,graph generation,"**Graph Generation** is an important research area with significant applications in drug and material designs. + + +Source: [Graph Deconvolutional Generation ](https://arxiv.org/abs/2002.07087)" +graphs,hypergraph partitioning, +graphs,hyperedge classification, +graphs,image relighting,Image relighting involves changing the illumination settings of an image. +graphs,initial structure to relaxed energy is2re, +graphs,set to graph prediction, +graphs,graph clustering,"**Graph Clustering** is the process of grouping the nodes of the graph into clusters, taking into account the edge structure of the graph in such a way that there are several edges within each cluster and very few between clusters. Graph Clustering intends to partition the nodes in the graph into disjoint groups. + + +Source: [Clustering for Graph Datasets via Gumbel Softmax ](https://arxiv.org/abs/2005.02372)" +graphs,calibration for link prediction, +graphs,connectivity estimation, +graphs,stochastic block model, +graphs,graph to graph translation, +graphs,graph classification,"( Image credit: [Hierarchical Graph Pooling with Structure Learning](https://github.com/cszhangzhen/HGP-SL) )" +graphs,structual feature correlation,Expressive Power of GNN to predict structural feature's correlation mutually. +graphs,clustering ensemble, +graphs,node classification on non homophilic,"There exists a non-trivial set of graphs where graph-aware models underperform their corresponding graph-agnostic models, e.g. SGC and GCN underperform MLP with 1 layer and 2 layers. Although still controversial, people believe the performance degradation results from heterophily, i.e. there exist much more inter-class edges than inner-class edges. This task aims to evaluate models designed for non-homophilic (heterophilic) datasets." +graphs,physics informed machine learning,Machine learning used to represent physics-based and/or engineering models +graphs,dynamic graph embedding, +knowledge-base,multi modal entity alignment, +knowledge-base,open knowledge graph embedding, +knowledge-base,inductive knowledge graph completion, +knowledge-base,knowledge graphs data curation, +knowledge-base,entity alignment,"**Entity Alignment** is the task of finding entities in two knowledge bases that refer to the same real-world object. It plays a vital role in automatically integrating multiple knowledge bases. +Note: results that have incorporated machine translated entity names (introduced in the RDGCN paper) or pre-alignment name embeddings are considered to have used **extra training labels** (both are marked with ""Extra Training Data"" in the leaderboard) and are **not adhere to a comparable setting** with others that have followed the original setting of the benchmark. + +Source: [Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding ](https://arxiv.org/abs/1708.05045) + +The task of entity alignment is related to the task of [entity resolution](https://paperswithcode.com/task/entity-resolution) which focuses on matching structured entity descriptions in different contexts." +knowledge-base,relational pattern learning,Learning and extracting the hidden patterns among the relations in a Knowledge Graph. +knowledge-base,knowledge graph completion,"Knowledge graphs $G$ are represented as a collection of triples $\\{(h, r, t)\\}\subseteq E\times R\times E$, where $E$ and $R$ are the entity set and relation set. The task of **Knowledge Graph Completion** is to either predict unseen relations $r$ between two existing entities: $(h, ?, t)$ or predict the tail entity $t$ given the head entity and the query relation: $(h, r, ?)$. + + +Source: [One-Shot Relational Learning for Knowledge Graphs ](https://arxiv.org/abs/1808.09040)" +knowledge-base,complex query answering,"This task is concerned with answering complex queries over incomplete knowledge graphs. In the most simple case, the task is reduced to link prediction: a 1-hop query for predicting the existence of an edge between a pair of nodes. Complex queries are concerned with other structures between nodes, such as 2-hop and 3-paths, and intersecting paths with intermediate variables." +knowledge-base,multi hop question answering, +knowledge-base,video to video synthesis, +knowledge-base,rdf dataset discovery,Given a URI find the RDF datasets containing this URI. +knowledge-base,knowledge graphs, +knowledge-base,table annotation,"**Table annotation** is the task of annotating a table with terms/concepts from knowledge graph or database schema. Table annotation is typically broken down into the following five subtasks: + +1. Cell Entity Annotation ([CEA](https://paperswithcode.com/task/cell-entity-annotation)) +2. Column Type Annotation ([CTA](https://paperswithcode.com/task/column-type-annotation)) +3. Column Property Annotation ([CPA](https://paperswithcode.com/task/columns-property-annotation)) +4. [Table Type Detection](https://paperswithcode.com/task/table-type-detection) +5. [Row Annotation](https://paperswithcode.com/task/row-annotation) + +The [SemTab](http://www.cs.ox.ac.uk/isg/challenges/sem-tab/) challenge is closely related to the Table Annotation problem. It is a yearly challenge which focuses on the first three tasks of table annotation and its purpose is to benchmark different table annotation systems." +knowledge-base,commonsense knowledge base construction, +knowledge-base,data integration, +knowledge-base,manufacturing simulation,Simulation of manufacturing system for applying AI methods and big data analysis +knowledge-base,attribute type extraction, +knowledge-base,temporal knowledge graph completion, +knowledge-base,open knowledge graph canonicalization,"Open Information Extraction approaches leads to creation of large Knowledge bases (KB) from the web. The problem with such methods is that their entities and relations are not canonicalized, which leads to storage of redundant and ambiguous facts. For example, an Open KB storing *\* and *\* doesn't know that *Barack Obama* and *Obama* mean the same entity. Similarly, *took birth in* and *was born in* also refer to the same relation. Problem of Open KB canonicalization involves identifying groups of equivalent entities and relations in the KB. + +( Image credit: [CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information](https://github.com/malllabiisc/cesi) )" +knowledge-base,link prediction,"Link prediction is a task to estimate the probability of links between nodes in a graph. + +( Image credit: [Inductive Representation Learning on Large Graphs](https://arxiv.org/pdf/1706.02216v4.pdf) )" +knowledge-base,breast cancer detection, +knowledge-base,symbolic regression,"producing a mathematical expression (symbolic expression) +that fits a given tabular data." +knowledge-base,table to knowledge graph matching, +knowledge-base,causal discovery,"( Image credit: [TCDF](https://github.com/M-Nauta/TCDF) )" +knowledge-base,open knowledge base completion, +knowledge-base,multi modal knowledge graph, +knowledge-base,drought stress, +knowledge-base,cross lingual sememe prediction,Predict sememes for unannotated words in another language given sememe annotations of vocabulary in one language. +knowledge-base,knowledge base completion,"Knowledge base completion is the task which automatically infers missing facts by reasoning about the information already present in the knowledge base. A knowledge base is a collection of relational +facts, often represented in the form of ""subject"", ""relation"", ""object""-triples." +knowledge-base,face reconstruction,"3D face reconstruction is the task of reconstructing a face from an image into a 3D form (or mesh). + +( Image credit: [3DDFA_V2](https://github.com/cleardusk/3DDFA_V2) )" +knowledge-base,non intrusive load monitoring, +knowledge-base,research knowledge graph population,Research on the association of social population relations based on knowledge graph +medical,multi label classification of biomedical, +medical,tomography, +medical,automatic sleep stage classification, +medical,classification of breast cancer histology, +medical,protein complex prediction, +medical,semi supervised medical image classification,Semi-supervised Medical Image Classification +medical,breast cancer histology image classification, +medical,mental arithmetic task, +medical,ecg risk stratification, +medical,bladder segmentation, +medical,molecular docking,"Predicting the binding structure of a small molecule ligand to a protein, which is critical to drug design. + +Description from: [DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking](https://paperswithcode.com/paper/diffdock-diffusion-steps-twists-and-turns-for)" +medical,automated pancreas segmentation, +medical,immune repertoire classification, +medical,magnetic resonance fingerprinting, +medical,volumetric medical image segmentation, +medical,seizure prediction, +medical,whole mammogram classification, +medical,population assignment, +medical,medical image segmentation,"Medical image segmentation is the task of segmenting objects of interest in a medical image. + +( Image credit: [IVD-Net](https://github.com/josedolz/IVD-Net) )" +medical,respiratory rate estimation, +medical,quantum state tomography, +medical,predicting patient outcomes, +medical,diabetic retinopathy grading,Grading the severity of diabetic retinopathy from (ophthalmic) fundus images +medical,cancer, +medical,length of stay prediction, +medical,disease trajectory forecasting, +medical,discovery of integrative cancer subtypes, +medical,ecg classification, +medical,pneumonia detection, +medical,skin lesion segmentation, +medical,acute stroke lesion segmentation, +medical,readmission prediction, +medical,brain lesion segmentation from mri, +medical,knee cartilage defect assessment, +medical,mass segmentation from mammograms, +medical,brain image segmentation, +medical,lung disease classification, +medical,cell segmentation,"**Cell Segmentation** is a task of splitting a microscopic image domain into segments, which represent individual instances of cells. It is a fundamental step in many biomedical studies, and it is regarded as a cornerstone of image-based cellular research. Cellular morphology is an indicator of a physiological state of the cell, and a well-segmented image can capture biologically relevant morphological information. + + +Source: [Cell Segmentation by Combining Marker-controlled Watershed and Deep Learning ](https://arxiv.org/abs/2004.01607)" +medical,medical relation extraction,Biomedical relation extraction is the task of detecting and classifying semantic relationships from biomedical text. +medical,medical image retrieval, +medical,infant brain mri segmentation, +medical,medical diagnosis,"**Medical Diagnosis** is the process of identifying the disease a patient is affected by, based on the assessment of specific risk factors, signs, symptoms and results of exams. + + +Source: [A probabilistic network for the diagnosis of acute cardiopulmonary diseases ](https://arxiv.org/abs/1609.06864)" +medical,medical imaging segmentation,"3D medical imaging segmentation is the task of segmenting medical objects of interest from 3D medical imaging. + +( Image credit: [Elastic Boundary Projection for 3D Medical Image Segmentation](https://github.com/twni2016/Elastic-Boundary-Projection) )" +medical,respiratory failure,Continuous prediction of onset of respiratory failure in the next 12h given the patient is not in failure now. +medical,pancreas segmentation,"Pancreas segmentation is the task of segmenting out the pancreas from medical imaging. + +Convolutional neural network" +medical,ecg denoising, +medical,electron tomography, +medical,semantic segmentation of orthoimagery, +medical,automatic liver and tumor segmentation, +medical,muscle force prediction, +medical,respiratory motion forecasting,Respiratory motion forecasting to compensate for the latency of the radiotherapy treatment systems and target more accurately chest tumors. +medical,liver segmentation, +medical,motion correction in multishot mri, +medical,sleep staging,Human Sleep Staging into W-R-N or W-R-L-D classes from multiple or single polysomnography signals +medical,participant intervention comparison outcome,"PICO recognition is an information extraction task for identifying Participant, Intervention, Comparator, and Outcome (PICO elements) information from clinical literature." +medical,circulatory failure,"Continuous prediction of onset of circulatory failure in the next 12h, given the patient is not in failure now." +medical,skin cancer segmentation, +medical,lung nodule classification, +medical,brain morphometry,Measurement of brain structures from neuroimaging (MRI). +medical,brain ventricle localization and segmentation, +medical,splenomegaly segmentation on multi modal mri, +medical,mammogram, +medical,alzheimer s disease detection, +medical,breast cancer detection, +medical,oral cancer classification, +medical,cervical spondylosis identification, +medical,multi diseases detection, +medical,surgical gesture recognition, +medical,electromyography emg, +medical,acoustic echo cancellation, +medical,medical waveform analysis,"Information extraction from medical waveforms such as the electrocardiogram (ECG), arterial blood pressure (ABP) central venous pressure (CVP), photoplethysmogram (PPG, Pleth)." +medical,lesion segmentation,"Lesion segmentation is the task of segmenting out lesions from other objects in medical based images. + +( Image credit: [D-UNet](https://arxiv.org/pdf/1908.05104v1.pdf) )" +medical,fovea detection, +medical,medical image generation,"Medical image generation is the task of synthesising new medical images. + +( Image credit: [Towards Adversarial Retinal Image Synthesis](https://arxiv.org/pdf/1701.08974v1.pdf) )" +medical,pulmonary nodules classification, +medical,patient outcomes, +medical,spindle detection, +medical,medical code prediction,"Context: Prediction of medical codes from clinical notes is both a practical and essential need for every healthcare delivery organization within current medical systems. Automating annotation will save significant time and excessive effort by human coders today. A new milestone will mark a meaningful step toward fully Autonomous Medical Coding in machines reaching parity with human coders' performance in medical code prediction. + +Question: What exactly is the medical code prediction problem? + +Answer: Clinical notes contain much information about what precisely happened during the patient's entire stay. And those clinical notes (e.g., discharge summary) is typically long, loosely structured, consists of medical domain language, and sometimes riddled with spelling errors. So, it's a highly multi-label classification problem, and the forthcoming ICD-11 standard will add more complexity to the problem! The medical code prediction problem is to annotate this clinical note with multiple codes subset from nearly 70K total codes (in the current ICD-10 system, for example)." +medical,spo2 estimation,SpO2 estimation +medical,surgical skills evaluation,The task is to classify surgical skills using data that is recorded during the surgical intervention. +medical,sequential diagnosis, +medical,protein secondary structure prediction, +medical,cervical cancer biopsy identification, +medical,single cell modeling,Single Cell RNA sequencing (scRNAseq) revolutionized our understanding of the fundamental of life sciences. The technology enables an unprecedented resolution to study heterogeneity in cell populations and their functionalities. +medical,placenta segmentation, +medical,mapping of lung nodules in low dose ct images, +medical,diabetic foot ulcer detection, +medical,text based de novo molecule generation, +medical,brain tumor segmentation,"Brain tumor segmentation is the task of segmenting tumors from other brain artefacts in MRI image of the brain. + + +( Image credit: [Brain Tumor Segmentation with Deep Neural Networks](https://github.com/naldeborgh7575/brain_segmentation) )" +medical,photoplethysmography ppg,"**Photoplethysmography (PPG)** is a non-invasive light-based method that has been used since the 1930s for monitoring cardiovascular activity. + + +Source: [Non-contact transmittance photoplethysmographic imaging (PPGI) for long-distance cardiovascular monitoring ](https://arxiv.org/abs/1503.06775)" +medical,prediction of cancer cell line sensitivity, +medical,transfer learning,"Transfer learning is a methodology where weights from a model trained on one task are taken and either used (a) to construct a fixed feature extractor, (b) as weight initialization and/or fine-tuning. + +( Image credit: [Subodh Malgonde](https://medium.com/@subodh.malgonde/transfer-learning-using-tensorflow-52a4f6bcde3e) )" +medical,phenotype classification, +medical,cerebrovascular network segmentation, +medical,ventricular fibrillation detection, +medical,white matter fiber tractography, +medical,molecule interpretation, +medical,diabetic retinopathy detection, +medical,congestive heart failure detection, +medical,medical procedure,Predicting medical procedures performed during a hospital admission. +medical,blood pressure estimation, +medical,malaria risk exposure prediction, +medical,drug response prediction, +medical,mitosis detection, +medical,brain segmentation,"( Image credit: [3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study](https://github.com/josedolz/LiviaNET) )" +medical,radiologist binary classification,"This task measures a radiologist's performance on distinguishing between generated (e.g. with a GAN, VAE, etc.) and real images, ascribing to the high visual quality of the synthesized images, and to their potential use in advancing and facilitating downstream medical tasks." +medical,birl cima,"BIRL: Benchmark on Image Registration methods with Landmark validation, in particular, Biomedical image registration on WSI microscopy images of a multi-strain histology tissue sample." +medical,diffeomorphic medical image registration,"Diffeomorphic mapping is the underlying technology for mapping and analyzing information measured in human anatomical coordinate systems which have been measured via Medical imaging. Diffeomorphic mapping is a broad term that actually refers to a number of different algorithms, processes, and methods. It is attached to many operations and has many applications for analysis and visualization. Diffeomorphic mapping can be used to relate various sources of information which are indexed as a function of spatial position as the key index variable. Diffeomorphisms are by their Latin root structure preserving transformations, which are in turn differentiable and therefore smooth, allowing for the calculation of metric based quantities such as arc length and surface areas. Spatial location and extents in human anatomical coordinate systems can be recorded via a variety of Medical imaging modalities, generally termed multi-modal medical imagery, providing either scalar and or vector quantities at each spatial location. + + +( Image credit: [Quicksilver](https://arxiv.org/pdf/1703.10908.pdf) )" +medical,image outpainting,"Predicting the visual context of an image beyond its boundary. + +Image credit: [NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis](https://paperswithcode.com/paper/nuwa-infinity-autoregressive-over?from=n35)" +medical,ecg wave delineation,"Delineation of the waveforms P, T and QRS complexes from ECG signals" +medical,sleep quality prediction,"( Image credit: [DeepSleep](https://github.com/GuanLab/DeepSleep) )" +medical,protein function prediction, +medical,muscle tendon junction identification, +medical,cardiac segmentation, +medical,medical image classification,Video capsule endoscopy image classification +medical,photoplethysmography ppg beat detection,Detecting heartbeats in the photoplethysmogram (PPG) signal +medical,optic cup segmentation,"Optic cup segmentation, concentric with optic disc, useful for glaucoma management (ophthalmology)" +medical,skin cancer classification, +medical,medical image registration,"Image registration, also known as image fusion or image matching, is the process of aligning two or more images based on image appearances. **Medical Image Registration** seeks to find an optimal spatial transformation that best aligns the underlying anatomical structures. Medical Image Registration is used in many clinical applications such as image guidance, motion tracking, segmentation, dose accumulation, image reconstruction and so on. Medical Image Registration is a broad topic which can be grouped from various perspectives. From input image point of view, registration methods can be divided into unimodal, multimodal, interpatient, intra-patient (e.g. same- or different-day) registration. From deformation model point of view, registration methods can be divided in to rigid, affine and deformable methods. From region of interest (ROI) perspective, registration methods can be grouped according to anatomical sites such as brain, lung registration and so on. From image pair dimension perspective, registration methods can be divided into 3D to 3D, 3D to 2D and 2D to 2D/3D. + + +Source: [Deep Learning in Medical Image Registration: A Review ](https://arxiv.org/abs/1912.12318)" +medical,decision making under uncertainty, +medical,ecg qrs detection, +medical,eeg decoding,**EEG Decoding** - extracting useful information directly from EEG data. +medical,breast tumour classification, +medical,seizure detection,"**Seizure Detection** is a binary supervised classification problem with the aim of classifying between seizure and non-seizure states of a patient. + + +Source: [ResOT: Resource-Efficient Oblique Trees for Neural Signal Classification ](https://arxiv.org/abs/2006.07900)" +medical,knee osteoarthritis prediction, +medical,emg signal prediction, +medical,colon cancer detection in confocal laser, +medical,cervical nucleus detection, +medical,covid 19 detection,Covid-19 Diagnosis is the task of diagnosing the presence of COVID-19 in an individual with machine learning. +medical,classification of age related macular, +medical,lung nodule segmentation, +medical,metal artifact reduction,Metal artifact reduction aims to remove the artifacts introduced by metallic implants in CT images. +medical,finding pulmonary nodules in large scale ct, +medical,iris segmentation, +medical,cbct artifact reduction, +medical,breast tissue identification, +medical,epidemiology,"**Epidemiology** is a scientific discipline that provides reliable knowledge for clinical medicine focusing on prevention, diagnosis and treatment of diseases. Research in Epidemiology aims at characterizing risk factors for the outbreak of diseases and at evaluating the efficiency of certain treatment strategies, e.g., to compare a new treatment with an established gold standard. This research is strongly hypothesis-driven and statistical analysis is the major tool for epidemiologists so far. Correlations between genetic factors, environmental factors, life style-related parameters, age and diseases are analyzed. + + +Source: [Visual Analytics of Image-Centric Cohort Studies in Epidemiology ](https://arxiv.org/abs/1501.04009)" +medical,kidney function,Continuous prediction of urine production in the next 2h as an average rate in ml/kg/h. The task is predicted at irregular intervals. +medical,atrial fibrillation, +medical,emg gesture recognition,Electromyographic Gesture Recognition +medical,skull stripping, +medical,bone suppression from dual energy chest x, +medical,icu mortality,Prediction of a patient mortality in the Intensive Care Unit (ICU) given its first hours of Electronic Health Record (EHR). +medical,synthetic data generation,The generation of tabular data by any means possible. +medical,pain intensity regression, +medical,breast mass segmentation in whole mammograms, +medical,mortality prediction,"( Image credit: [Early hospital mortality prediction using vital signals](https://arxiv.org/pdf/1803.06589v2.pdf) )" +medical,medial knee jrf prediction, +medical,sleep stage detection, +medical,pulse wave simulation,Simulating arterial pulse waves +medical,remaining length of stay,Continuous prediction of the remaining ICU stay duration. +medical,joint vertebrae identification and, +medical,myocardial infarction detection, +medical,retinal vessel segmentation,"Retinal vessel segmentation is the task of segmenting vessels in retina imagery. + +( Image credit: [LadderNet](https://github.com/juntang-zhuang/LadderNet) )" +medical,blind docking, +medical,breast density classification, +medical,low dose x ray ct reconstruction, +medical,multi subject fmri data alignment, +medical,optic disc detection,Region proposal for optic disc +medical,heartbeat classification, +medical,sleep arousal detection,"Sleep arousal is a kind of EEG events happened during octurnal sleep. Too many arousals will contribute to many health problem, like daytime sleepiness, memory loss, diabetes, etc. Some research take it as a kind of sleep deprivation." +medical,multi tissue nucleus segmentation, +medical,skin lesion classification, +medical,sleep quality prediction 1,"( Image credit: [DeepSleep](https://github.com/GuanLab/DeepSleep) )" +medical,arrhythmia detection, +medical,genetic risk prediction,Polygenic Risk Scores (PRS) / Polygenic Scores (PGS) +medical,lifetime image denoising, +medical,nuclear segmentation, +medical,nuclei classification, +medical,heart rate estimation,RR interval detection and R peak detection from QRS complex +medical,sleep apnea detection, +medical,medical super resolution, +medical,synthesizing multi parameter magnetic, +medical,organ detection, +medical,cancer metastasis detection, +medical,drug discovery,"Drug discovery is the task of applying machine learning to discover new candidate drugs. + +( Image credit: [A Turing Test for Molecular Generators](https://pubs.acs.org/doi/10.1021/acs.jmedchem.0c01148) )" +medical,outcome prediction in multimodal mri, +medical,photoplethysmogram simulation,Simulating photoplethysmogram (PPG) signals +medical,deformable medical image registration, +medical,epilepsy prediction, +medical,ischemic stroke lesion segmentation, +medical,medical report generation,"Medical report generation (MRG) is a task which focus on training AI to automatically generate professional report according the input image data. This can help clinicians make faster and more accurate decision since the task itself is both time consuming and error prone even for experienced doctors. + + + +Deep neural network and transformer based architecture are currently the most popular methods for this certain task, however, when we try to transfer out pre-trained model into this certain domain, their performance always degrade. + + + +The following are some of the reasons why RSG is hard for pre-trained models: + +* Language datasets in a particular domain can sometimes be quite different from the large number of datasets available on the Internet +* During the fine-tuning phase, datasets in the medical field are often unevenly distributed + + + +More recently, multi-modal learning and contrastive learning have shown some inspiring results in this field, but it's still challenging and requires further attention. + + + +Here are some additional readings to go deeper on the task: + +* On the Automatic Generation of Medical Imaging Reports + + [ https://doi.org/10.48550/arXiv.1711.08195](https://doi.org/10.48550/arXiv.1711.08195) + +* A scoping review of transfer learning research on medical image analysis using ImageNet + + [ https://arxiv.org/abs/2004.13175](https://arxiv.org/abs/2004.13175) + +* A Survey on Incorporating Domain Knowledge into Deep Learning for Medical Image Analysis + + [ https://arxiv.org/abs/2004.12150]( https://arxiv.org/abs/2004.12150) + + + + + +(Image credit : Transformers in Medical Imaging: A Survey)" +medical,sleep micro event detection, +medical,tumour classification, +medical,x ray, +medical,k complex detection, +medical,medical concept normalization, +medical,noise estimation, +medical,ultrasound, +medical,histopathological image classification, +medical,skin lesion identification, +medical,multiple sequence alignment, +medical,lung nodule detection, +medical,atrial fibrillation recurrence estimation, +medical,disease prediction, +medical,skin, +medical,patient phenotyping,"Classifying patients after 24h regarding their admission diagnosis, +using the APACHE group II and IV labels." +medical,brain decoding,"**Motor Brain Decoding** is fundamental task for building motor brain computer interfaces (BCI). + +Progress in predicting finger movements based on brain activity allows us to restore motor functions and improve rehabilitation process of patients." +medical,clinical concept extraction,"Automatic extraction of clinical named entities such as clinical problems, treatments, tests and anatomical parts from clinical notes. + +( [Source](https://arxiv.org/pdf/2012.04005v1.pdf) )" +medical,diabetes prediction, +medical,heart rate variability,Heart rate variability (HRV) is the physiological phenomenon of variation in the time interval between heartbeats. It is measured by the variation in the beat-to-beat interval. +medical,automated pulmonary nodule detection and, +medical,multi focus microscopical images fusion, +medical,muscular movement recognition, +medical,als detection, +medical,blood cell detection, +medical,tomographic reconstructions, +medical,computational phenotyping,"**Computational Phenotyping** is the process of transforming the noisy, massive Electronic Health Record (EHR) data into meaningful medical concepts that can be used to predict the risk of disease for an individual, or the response to drug therapy. + + +Source: [Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis ](https://arxiv.org/abs/1908.09888)" +medical,chemical reaction prediction, +medical,molecule captioning, +medical,colorectal gland segmentation, +medical,atrial fibrillation detection, +medical,anxiety detection,Detect anxiety distress of human beings / animals +medical,lung cancer diagnosis, +medical,qrs complex detection, +medical,shadow confidence maps in ultrasound imaging, +medical,medical x ray image segmentation, +medical,molecular dynamics, +medical,optic cup detection,Region proposal for optic cup +medical,pulmonary embolism detection, +medical,photoplethysmography ppg heart rate,Estimating heart rate from the photoplethysmogram (PPG) signal +medical,multimodal sleep stage detection,"Using multiple modalities such as EEG+EOG, EEG+HR instead of just relying on EEG (polysomnography)" +medical,registration of sparse clinical images, +methodology,distributed optimization,"The goal of **Distributed Optimization** is to optimize a certain objective defined over millions of billions of data that is distributed over many machines by utilizing the computational power of these machines. + + +Source: [Analysis of Distributed StochasticDual Coordinate Ascent ](https://arxiv.org/abs/1312.1031)" +methodology,multi label classification,"**Multi-Label Classification** is the supervised learning problem where an instance may be associated with multiple labels. This is an extension of single-label classification (i.e., multi-class, or binary) where each instance is only associated with a single class label. + + +Source: [Deep Learning for Multi-label Classification ](https://arxiv.org/abs/1502.05988)" +methodology,interpretable machine learning,"The goal of **Interpretable Machine Learning** is to allow oversight and understanding of machine-learned decisions. Much of the work in Interpretable Machine Learning has come in the form of devising methods to better explain the predictions of machine learning models. + + +Source: [Assessing the Local Interpretability of Machine Learning Models ](https://arxiv.org/abs/1902.03501)" +methodology,online nonnegative cp decomposition, +methodology,learning representation on graph, +methodology,active learning,"**Active Learning** is a paradigm in supervised machine learning which uses fewer training examples to achieve better optimization by iteratively training a predictor, and using the predictor in each iteration to choose the training examples which will increase its chances of finding better configurations and at the same time improving the accuracy of the prediction model + + +Source: [Polystore++: Accelerated Polystore System for Heterogeneous Workloads ](https://arxiv.org/abs/1905.10336)" +methodology,detection of higher order dependencies, +methodology,sentence embeddings, +methodology,partial domain adaptation,"**Partial Domain Adaptation** is a transfer learning paradigm, which manages to transfer relevant knowledge from a large-scale source domain to a small-scale target domain. + + +Source: [Deep Residual Correction Network for Partial Domain Adaptation ](https://arxiv.org/abs/2004.04914)" +methodology,depth rgb anomaly detection,Depth + RGB Anomaly Detection +methodology,rgb anomaly segmentation,3D + RGB Anomaly Segmentation +methodology,chatbot,"**Chatbot** or conversational AI is a language model designed and implemented to have conversations with humans. + + +Source: [Open Data Chatbot ](https://arxiv.org/abs/1909.03653) + +[Image source](https://arxiv.org/pdf/2006.16779v3.pdf)" +methodology,mutual information estimation,"To estimate mutual information from samples, specially for high-dimensional variables." +methodology,meta learning,"Meta-learning is a methodology considered with ""learning to learn"" machine learning algorithms. + +( Image credit: [Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks](https://arxiv.org/pdf/1703.03400v3.pdf) )" +methodology,sentence embedding, +methodology,normalising flows, +methodology,probabilistic programming,"Probabilistic programming languages are designed to describe probabilistic models and then perform inference in those models. PPLs are closely related to graphical models and Bayesian networks, but are more expressive and flexible. + +( Image credit: [Michael Betancourt](https://betanalpha.github.io/writing/) )" +methodology,automl,"Automated Machine Learning (**AutoML**) is a general concept which covers diverse techniques for automated model learning including automatic data preprocessing, architecture search, and model selection. +Source: Evaluating recommender systems for AI-driven data science (1905.09205) + + +Source: [CHOPT : Automated Hyperparameter Optimization Framework for Cloud-Based Machine Learning Platforms ](https://arxiv.org/abs/1810.03527)" +methodology,stroke classification, +methodology,domain adaptation,"Domain adaptation is the task of adapting models across domains. This is motivated by the challenge where the test and training datasets fall from different data distributions due to some factor. Domain adaptation aims to build machine learning models that can be generalized into a target domain and dealing with the discrepancy across domain distributions. + +Further readings: + +- [A Brief Review of Domain Adaptation](https://paperswithcode.com/paper/a-brief-review-of-domain-adaptation) + +( Image credit: [Unsupervised Image-to-Image Translation Networks](https://arxiv.org/pdf/1703.00848v6.pdf) )" +methodology,privacy preserving deep learning,"The goal of privacy-preserving (deep) learning is to train a model while preserving privacy of the training dataset. Typically, it is understood that the trained model should be privacy-preserving (e.g., due to the training algorithm being differentially private)." +methodology,machine learning, +methodology,anomaly detection,"Humans are able to detect heterogeneous or unexpected patterns in a set of homogeneous natural images. This task is known as anomaly or novelty detection and has a large number of applications. Anomaly detection automation would enable constant quality control by avoiding reduced attention span and facilitating human operator work. + +Anomaly detection is a binary classification between the normal and the anomalous classes. However, it is not possible to train a model with full supervision for this task because we frequently lack anomalous examples, and, what is more, anomalies can have unexpected patterns. + +[Image source]: [GAN-based Anomaly Detection in Imbalance Problems](https://paperswithcode.com/paper/gan-based-anomaly-detection-in-imbalance)" +methodology,data free quantization,"**Data Free Quantization** is a technique to achieve a highly accurate quantized model without accessing any training data. + +Source: [Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples](https://arxiv.org/abs/2111.02625)" +methodology,representation learning,"Representation learning is concerned with training machine learning algorithms to learn useful representations, e.g. those that are interpretable, have latent features, or can be used for transfer learning. + +Deep neural networks can be considered representation learning models that typically encode information which is projected into a different subspace. These representations are then usually passed on to a linear classifier to, for instance, train a classifier. + +Representation learning can be divided into: + +- **Supervised representation learning**: learning representations on task A using annotated data and used to solve task B +- **Unsupervised representation learning**: learning representations on a task in an unsupervised way (label-free data). These are then used to address downstream tasks and reducing the need for annotated data when learning news tasks. Powerful models like [GPT](/method/gpt) and [BERT](/method/bert) leverage unsupervised representation learning to tackle language tasks. + +More recently, [self-supervised learning (SSL)](/task/self-supervised-learning) is one of the main drivers behind unsupervised representation learning in fields like computer vision and NLP. + +Here are some additional readings to go deeper on the task: + +- [Representation Learning: A Review and New Perspectives](/paper/representation-learning-a-review-and-new) - Bengio et al. (2012) +- [A Few Words on Representation Learning](https://sthalles.github.io/a-few-words-on-representation-learning/) - Thalles Silva + +( Image credit: [Visualizing and Understanding Convolutional Networks](https://arxiv.org/pdf/1311.2901.pdf) )" +methodology,unsupervised domain adaptation,"**Unsupervised Domain Adaptation** is a learning framework to transfer knowledge learned from source domains with a large number of annotated training examples to target domains with unlabeled data only. + + +Source: [Domain-Specific Batch Normalization for Unsupervised Domain Adaptation ](https://arxiv.org/abs/1906.03950)" +methodology,hypothesis testing,"In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, each independently obtained from a different given population. The purpose of the test is to determine whether the difference between these two populations is statistically significant. The statistics used in two-sample tests can be used to solve many machine learning problems, such as domain adaptation, covariate shift and generative adversarial networks." +methodology,quantum circuit mapping,Mapping quantum circuits to quantum devices +methodology,rgb depth anomaly detection and segmentation,RGB+Depth Anomaly Detection and Segmentation +methodology,depthanomaly detection,Depth-only Anomaly Detection +methodology,metaheuristic optimization,"In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, or select a heuristic (partial search algorithm) that may provide a sufficiently good solution to an optimization problem. For some examples, you can visit https://aliasgharheidari.com/Publications.html" +methodology,learning representation of multi view data, +methodology,architecture search,"**Neural architecture search (NAS)** is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS essentially takes the process of a human manually tweaking a neural network and learning what works well, and automates this task to discover more complex architectures. + +Image Credit : [NAS with Reinforcement Learning](https://arxiv.org/abs/1611.01578)" +methodology,few shot imitation learning, +methodology,group anomaly detection, +methodology,outlier detection,"**Outlier Detection** is a task of identifying a subset of a given data set which are considered anomalous in that they are unusual from other instances. It is one of the core data mining tasks and is central to many applications. In the security field, it can be used to identify potentially threatening users, in the manufacturing field it can be used to identify parts that are likely to fail. + + +Source: [Coverage-based Outlier Explanation ](https://arxiv.org/abs/1911.02617)" +methodology,explanation fidelity evaluation,Evaluation of explanation fidelity with respect to the underlying model. +methodology,multimodal text and image classification,Classification with both source Image and Text +methodology,hyperparameter optimization,"**Hyperparameter Optimization** is the problem of choosing a set of optimal hyperparameters for a learning algorithm. Whether the algorithm is suitable for the data directly depends on hyperparameters, which directly influence overfitting or underfitting. Each model requires different assumptions, weights or training speeds for different types of data under the conditions of a given loss function. + + +Source: [Data-driven model for fracturing design optimization: focus on building digital database and production forecast ](https://arxiv.org/abs/1910.14499)" +methodology,learning semantic representations, +methodology,arbitrary conditional density estimation, +methodology,generalization bounds, +methodology,few shot relation classification,"**Few-Shot Relation Classification** is a particular relation classification task under minimum annotated data, where a model is required to classify a new incoming query instance given only few support instances (e.g., 1 or 5) during testing. + + +Source: [MICK: A Meta-Learning Framework for Few-shot Relation Classification with Little Training Data ](https://arxiv.org/abs/2004.14164)" +methodology,anomaly detection in surveillance videos, +methodology,extreme multi label classification,Extreme Multi-Label Classification is a supervised learning problem where an instance may be associated with multiple labels. The two main problems are the unbalanced labels in the dataset and the amount of different labels. +methodology,data visualization, +methodology,sparse learning, +methodology,personalized federated learning,"The federated learning setup presents numerous challenges including data heterogeneity (differences in data distribution), device heterogeneity (in terms of computation capabilities, network connection, etc.), and communication efficiency. +Especially data heterogeneity makes it hard to learn a single shared global model that applies to all clients. To overcome these issues, Personalized Federated Learning (PFL) aims to personalize the global model for each client in the federation." +methodology,unsupervised anomaly detection in sound, +methodology,density estimation,"The goal of **Density Estimation** is to give an accurate description of the underlying probabilistic density distribution of an observable data set with unknown density. + + +Source: [Contrastive Predictive Coding Based Feature for Automatic Speaker Verification ](https://arxiv.org/abs/1904.01575)" +methodology,low rank matrix completion,"**Low-Rank Matrix Completion** is an important problem with several applications in areas such as recommendation systems, sketching, and quantum tomography. The goal in matrix completion is to recover a low rank matrix, given a small number of entries of the matrix. + + +Source: [Universal Matrix Completion ](https://arxiv.org/abs/1402.2324)" +methodology,learning network representations, +methodology,ensemble learning, +methodology,knowledge graph embeddings, +methodology,computed tomography ct,"The term “computed tomography”, or CT, refers to a computerized x-ray imaging procedure in which a narrow beam of x-rays is aimed at a patient and quickly rotated around the body, producing signals that are processed by the machine's computer to generate cross-sectional images—or “slices”—of the body. + +( Image credit: [Liver Lesion Detection from Weakly-labeled Multi-phase CT Volumes with a Grouped Single Shot MultiBox Detector](https://github.com/L0SG/grouped-ssd-pytorch) )" +methodology,word embeddings,"Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers. + +Techniques for learning word embeddings can include Word2Vec, GloVe, and other neural network-based approaches that train on an NLP task such as language modeling or document classification. + +( Image credit: [Dynamic Word Embedding for Evolving Semantic Discovery](https://arxiv.org/pdf/1703.00607v2.pdf) )" +methodology,inductive logic programming, +methodology,matrix completion,"**Matrix Completion** is a method for recovering lost information. It originates from machine learning and usually deals with highly sparse matrices. Missing or unknown data is estimated using the low-rank matrix of the known data. + + +Source: [A Fast Matrix-Completion-Based Approach for Recommendation Systems ](https://arxiv.org/abs/1912.00600)" +methodology,subdomain adaptation, +methodology,knowledge distillation,"Knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized." +methodology,few shot image classification,"Few-shot image classification is the task of doing image classification with only a few examples for each category (typically < 6 examples). + +( Image credit: [Learning Embedding Adaptation for Few-Shot Learning](https://github.com/Sha-Lab/FEAT) )" +methodology,zero shot learning,"**Zero-shot learning (ZSL)** is a model's ability to detect classes never seen during training. The condition is that the classes are not known during supervised learning. + +Earlier work in zero-shot learning use attributes in a two-step approach to infer unknown classes. In the computer vision context, more recent advances learn mappings from image feature space to semantic space. Other approaches learn non-linear multimodal embeddings. In the modern NLP context, language models can be evaluated on downstream tasks without fine tuning. + +Benchmark datasets for zero-shot learning include [aPY](/dataset/apy), [AwA](/dataset/awa2-1), and [CUB](/dataset/cub-200-2011), among others. + +( Image credit: [Prototypical Networks for Few shot Learning in PyTorch +](https://github.com/orobix/Prototypical-Networks-for-Few-shot-Learning-PyTorch) ) + +Further readings: + +- [Zero-Shot Learning -- A Comprehensive Evaluation of the Good, the Bad and the Ugly](https://paperswithcode.com/paper/zero-shot-learning-a-comprehensive-evaluation) +- [Zero-Shot Learning in Modern NLP](https://joeddav.github.io/blog/2020/05/29/ZSL.html) +- [Zero-Shot Learning for Text Classification](https://amitness.com/2020/05/zero-shot-text-classification/)" +methodology,quantization,"**Quantization** is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16). + + +Source: [Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers ](https://arxiv.org/abs/1911.00361)" +methodology,thompson sampling,"Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief." +methodology,model extraction, +methodology,incremental learning,Incremental learning aims to develop artificially intelligent systems that can continuously learn to address new tasks from new data while preserving knowledge learned from previously learned tasks. +methodology,multi goal reinforcement learning, +methodology,bayesian optimisation,"Expensive black-box functions are a common problem in many disciplines, including tuning the parameters of machine learning algorithms, robotics, and other engineering design problems. **Bayesian Optimisation** is a principled and efficient technique for the global optimisation of these functions. The idea behind Bayesian Optimisation is to place a prior distribution over the target function and then update that prior with a set of “true” observations of the target function by expensively evaluating it in order to produce a posterior predictive distribution. The posterior then informs where to make the next observation of the target function through the use of an acquisition function, which balances the exploitation of regions known to have good performance with the exploration of regions where there is little information about the function’s response. + + +Source: [A Bayesian Approach for the Robust Optimisation of Expensive-to-Evaluate Functions ](https://arxiv.org/abs/1904.11416)" +methodology,statistical independence testing, +methodology,rgb anomaly detection,3D + RGB Anomaly Detection +methodology,multi agent reinforcement learning,"The target of **Multi-agent Reinforcement Learning** is to solve complex problems by integrating multiple agents that focus on different sub-tasks. In general, there are two types of multi-agent systems: independent and cooperative systems. + + +Source: [Show, Describe and Conclude: On Exploiting the Structure Information of Chest X-Ray Reports ](https://arxiv.org/abs/2004.12274)" +methodology,transfer learning,"Transfer learning is a methodology where weights from a model trained on one task are taken and either used (a) to construct a fixed feature extractor, (b) as weight initialization and/or fine-tuning. + +( Image credit: [Subodh Malgonde](https://medium.com/@subodh.malgonde/transfer-learning-using-tensorflow-52a4f6bcde3e) )" +methodology,gaussian processes,"**Gaussian Processes** is a powerful framework for several machine learning tasks such as regression, classification and inference. Given a finite set of input output training data that is generated out of a fixed (but possibly unknown) function, the framework models the unknown function as a stochastic process such that the training outputs are a finite number of jointly Gaussian random variables, whose properties can then be used to infer the statistics (the mean and variance) of the function at test values of input. + + +Source: [Sequential Randomized Matrix Factorization for Gaussian Processes: Efficient Predictions and Hyper-parameter Optimization ](https://arxiv.org/abs/1711.06989)" +methodology,transfer reinforcement learning, +methodology,knowledge graph embedding, +methodology,clustering algorithms evaluation, +methodology,outlier ensembles, +methodology,document embedding, +methodology,model selection,"Given a set of candidate models, the goal of **Model Selection** is to select the model that best approximates the observed data and captures its underlying regularities. Model Selection criteria are defined such that they strike a balance between the goodness of fit, and the generalizability or complexity of the models. + + +Source: [Kernel-based Information Criterion ](https://arxiv.org/abs/1408.5810)" +methodology,rgb anomaly detection and segmentation,RGB+3D Anomaly Detection and Segmentation +methodology,efficient exploration,"**Efficient Exploration** is one of the main obstacles in scaling up modern deep reinforcement learning algorithms. The main challenge in Efficient Exploration is the balance between exploiting current estimates, and gaining information about poorly understood states and actions. + + +Source: [Randomized Value Functions via Multiplicative Normalizing Flows ](https://arxiv.org/abs/1806.02315)" +methodology,image outpainting,"Predicting the visual context of an image beyond its boundary. + +Image credit: [NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis](https://paperswithcode.com/paper/nuwa-infinity-autoregressive-over?from=n35)" +methodology,one shot learning,"One-shot learning is the task of learning information about object categories from a single training example. + +( Image credit: [Siamese Neural Networks for One-shot Image Recognition](https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf) )" +methodology,imitation learning,"**Imitation Learning** is a framework for learning a behavior policy from demonstrations. Usually, demonstrations are presented in the form of state-action trajectories, with each pair indicating the action to take at the state being visited. In order to learn the behavior policy, the demonstrated actions are usually utilized in two ways. The first, known as Behavior Cloning (BC), treats the action as the target label for each state, and then learns a generalized mapping from states to actions in a supervised manner. Another way, known as Inverse Reinforcement Learning (IRL), views the demonstrated actions as a sequence of decisions, and aims at finding a reward/cost function under which the demonstrated decisions are optimal. + +Finally, a newer methodology, Inverse Q-Learning aims at directly learning Q-functions from expert data, implicitly representing rewards, under which the optimal policy can be given as a Boltzmann distribution similar to soft Q-learning + +Source: [Learning to Imitate ](https://ai.stanford.edu/blog/learning-to-imitate)" +methodology,influence approximation,Estimating the influence of training triples on the behavior of a machine learning model. +methodology,anomaly detection,3D-only Anomaly Detection +methodology,distributional reinforcement learning,"Value distribution is the distribution of the random return received by a reinforcement learning agent. it been used for a specific purpose such as implementing risk-aware behaviour. + +We have random return Z whose expectation is the value Q. This random return is also described by a recursive equation, but one of a distributional nature" +methodology,q learning,"The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances. + +( Image credit: [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/pdf/1312.5602v1.pdf) )" +methodology,multiobjective optimization,"Multi-objective optimization (also known as multi-objective programming, vector optimization, multicriteria optimization, multiattribute optimization or Pareto optimization) is an area of multiple criteria decision making that is concerned with mathematical optimization problems involving more than one objective function to be optimized simultaneously. Multi-objective optimization has been applied in many fields of science, including engineering, economics and logistics where optimal decisions need to be taken in the presence of trade-offs between two or more conflicting objectives. Minimizing cost while maximizing comfort while buying a car, and maximizing performance whilst minimizing fuel consumption and emission of pollutants of a vehicle are examples of multi-objective optimization problems involving two and three objectives, respectively. In practical problems, there can be more than three objectives." +methodology,automatic machine learning model selection, +methodology,bilevel optimization,"**Bilevel Optimization** is a branch of optimization, which contains a nested optimization problem within the constraints of the outer optimization problem. The outer optimization task is usually referred as the upper level task, and the nested inner optimization task is referred as the lower level task. The lower level problem appears as a constraint, such that only an optimal solution to the lower level optimization problem is a possible feasible candidate to the upper level optimization problem. + + +Source: [Efficient Evolutionary Algorithm for Single-Objective Bilevel Optimization ](https://arxiv.org/abs/1303.3901)" +methodology,few shot camera adaptive color constancy, +methodology,structured prediction,"**Structured Prediction** is an area of machine learning focusing on representations of spaces with combinatorial structure, and algorithms for inference and parameter estimation over these structures. Core methods include both tractable exact approaches like dynamic programming and spanning tree algorithms as well as heuristic techniques such as linear programming relaxations and greedy search. + + +Source: [Torch-Struct: Deep Structured Prediction Library ](https://arxiv.org/abs/2002.00876)" +methodology,stochastic optimization,"**Stochastic Optimization** is the task of optimizing certain objective functional by generating and using stochastic random variables. Usually the Stochastic Optimization is an iterative process of generating random variables that progressively finds out the minima or the maxima of the objective functional. Stochastic Optimization is usually applied in the non-convex functional spaces where the usual deterministic optimization such as linear or quadratic programming or their variants cannot be used. + + +Source: [ASOC: An Adaptive Parameter-free Stochastic Optimization Techinique for Continuous Variables ](https://arxiv.org/abs/1506.08004)" +methodology,multi label text classification,"According to Wikipedia ""In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of more than two classes; in the multi-label problem there is no constraint on how many of the classes the instance can be assigned to.""" +methodology,unsupervised domain expansion, +methodology,learning word embeddings, +methodology,depth anomaly detection and segmentation,Depth-only Anomaly Detection and Segmentation +methodology,disentanglement,"This is an approach to solve a diverse set of tasks in a data efficient manner by disentangling (or isolating ) the underlying structure of the main problem into disjoint parts of its representations. This disentanglement can be done by focussing on the ""transformation"" properties of the world(main problem)" +methodology,novel class discovery,"The goal of Novel Class Discovery (NCD) is to identify new classes in unlabeled data, by exploiting prior knowledge from known classes. In this specific setup, the data is split in two sets. The first is a labeled set containing known classes and the second is an unlabeled set containing unknown classes that must be discovered." +methodology,core set discovery,A core set in machine learning is defined as the minimal set of training samples that allows a supervised algorithm to deliver a result as good as the one obtained when the whole set is used. +methodology,few shot learning,"**Few-Shot Learning** is an example of meta-learning, where a learner is trained on several related tasks, during the meta-training phase, so that it can generalize well to unseen (but related) tasks with just few examples, during the meta-testing phase. An effective approach to the Few-Shot Learning problem is to learn a common representation for various tasks and train task specific classifiers on top of this representation. + + +Source: [Penalty Method for Inversion-Free Deep Bilevel Optimization ](https://arxiv.org/abs/1911.03432)" +methodology,information plane,"To obtain the Information Plane (IP) of deep neural networks, which shows the trajectories of the hidden layers during training in a 2D plane using as coordinate axes the mutual information between the input and the hidden layer, and the mutual information between the output and the hidden layer." +methodology,generalized zero shot learning, +methodology,auxiliary learning,"Auxiliary learning aims to find or design auxiliary tasks which can improve the performance on one or some primary tasks. + +( Image credit: [Self-Supervised Generalisation with Meta Auxiliary Learning](https://arxiv.org/pdf/1901.08933v3.pdf) )" +methodology,dimensionality reduction,"Dimensionality reduction is the task of reducing the dimensionality of a dataset. + +( Image credit: [openTSNE](https://github.com/pavlin-policar/openTSNE) )" +methodology,abnormal event detection in video,"**Abnormal Event Detection In Video** is a challenging task in computer vision, as the definition of what an abnormal event looks like depends very much on the context. For instance, a car driving by on the street is regarded as a normal event, but if the car enters a pedestrian area, this is regarded as an abnormal event. A person running on a sports court (normal event) versus running outside from a bank (abnormal event) is another example. Although what is considered abnormal depends on the context, we can generally agree that abnormal events should be unexpected events that occur less often than familiar (normal) events + + +Source: [Unmasking the abnormal events in video ](https://arxiv.org/abs/1705.08182) + +Image: [Ravanbakhsh et al](https://arxiv.org/pdf/1708.09644v1.pdf)" +methodology,graph embedding,"Graph embeddings learn a mapping from a network to a vector space, while preserving relevant network properties. + +( Image credit: [GAT](https://github.com/PetarV-/GAT) )" +methodology,multiple instance learning,"**Multiple Instance Learning** is a type of weakly supervised learning algorithm where training data is arranged in bags, where each bag contains a set of instances $X=\\{x_1,x_2, \ldots,x_M\\}$, and there is one single label $Y$ per bag, $Y\in\\{0, 1\\}$ in the case of a binary classification problem. It is assumed that individual labels $y_1, y_2,\ldots, y_M$ exist for the instances within a bag, but they are unknown during training. In the standard Multiple Instance assumption, a bag is considered negative if all its instances are negative. On the other hand, a bag is positive, if at least one instance in the bag is positive. + + +Source: [Monte-Carlo Sampling applied to Multiple Instance Learning for Histological Image Classification ](https://arxiv.org/abs/1812.11560)" +methodology,continual pretraining, +methodology,continual learning,"**Continual Learning** (also known as **Incremental Learning**, **Life-long Learning**) is a concept to learn a model for a large number of tasks sequentially without forgetting knowledge obtained from the preceding tasks, where the data in the old tasks are not available anymore during training new ones. +If not mentioned, the benchmarks here are **Task-CL**, where task-id is provided on validation. + +Source: +[Continual Learning by Asymmetric Loss Approximation with Single-Side Overestimation](https://arxiv.org/abs/1908.02984) +[Three scenarios for continual learning](https://arxiv.org/abs/1904.07734) +[Lifelong Machine Learning](https://books.google.ca/books/about/Lifelong_Machine_Learning.html?id=JQ5pDwAAQBAJ&redir_esc=y) +[Continual lifelong learning with neural networks: A review](https://www.sciencedirect.com/science/article/pii/S0893608019300231)" +methodology,policy gradient methods, +methodology,combinatorial optimization,"**Combinatorial Optimization** is a category of problems which requires optimizing a function over a combination of discrete objects and the solutions are constrained. Examples include finding shortest paths in a graph, maximizing value in the Knapsack problem and finding boolean settings that satisfy a set of constraints. Many of these problems are NP-Hard, which means that no polynomial time solution can be developed for them. Instead, we can only produce approximations in polynomial time that are guaranteed to be some factor worse than the true optimal solution. + + +Source: [Recent Advances in Neural Program Synthesis ](https://arxiv.org/abs/1802.02353)" +methodology,domain generalization,"The idea of **Domain Generalization** is to learn from one or multiple training domains, to extract a domain-agnostic model which can be applied to an unseen domain + + +Source: [Diagram Image Retrieval using Sketch-Based Deep Learning and Transfer Learning ](https://arxiv.org/abs/2004.10780)" +methodology,experimental design, +methodology,multi task language understanding,"The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. https://arxiv.org/pdf/2009.03300.pdf" +methodology,long tail learning,"Long-tailed learning, one of the most challenging problems in visual recognition, aims to train well-performing models from a large number of images that follow a long-tailed class distribution." +methodology,network embedding,"**Network Embedding** is a collective term for techniques for mapping graph nodes to vectors of real numbers in a multidimensional space. To be useful, a good embedding should preserve the structure of the graph. The vectors can then be used as input to various network and graph analysis tasks, such as link prediction + + +Source: [Tutorial on NLP-Inspired Network Embedding ](https://arxiv.org/abs/1910.07212)" +methodology,graph representation learning,"The goal of **Graph Representation Learning** is to construct a set of features (‘embeddings’) representing the structure of the graph and the data thereon. We can distinguish among Node-wise embeddings, representing each node of the graph, Edge-wise embeddings, representing each edge in the graph, and Graph-wise embeddings representing the graph as a whole. + + +Source: [SIGN: Scalable Inception Graph Neural Networks ](https://arxiv.org/abs/2004.11198)" +methodology,model compression,"**Model Compression** is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks. + + +Source: [KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow ](https://arxiv.org/abs/2004.05319)" +methodology,long tail learning with class descriptors,"Long-tail learning by using class descriptors (like attributes, class embedding, etc) to learn tail classes as well as head classes." +methodology,multilingual word embeddings, +methodology,bayesian inference,Bayesian Inference is a methodology that employs Bayes Rule to estimate parameters (and their full posterior). +methodology,web credibility,Define the level of credibility of web sources +methodology,federated learning,"Federated Learning is a framework to train a centralized model for a task where the data is de-centralized across different devices/ silos. + +This helps preserve privacy of data on various devices as only the weight updates are shared with the centralized model so the data can remain on each device and we can still train a model using that data." +methodology,hard attention, +methodology,density ratio estimation,Estimating the ratio of one density function to the other. +methodology,metric learning,"The goal of **Metric Learning** is to learn a representation function that maps objects into an embedded space. The distance in the embedded space should preserve the objects’ similarity — similar objects get close and dissimilar objects get far away. Various loss functions have been developed for Metric Learning. For example, the **contrastive loss** guides the objects from the same class to be mapped to the same point and those from different classes to be mapped to different points whose distances are larger than a margin. **Triplet loss** is also popular, which requires the distance between the anchor sample and the positive sample to be smaller than the distance between the anchor sample and the negative sample. + + +Source: [Road Network Metric Learning for Estimated Time of Arrival ](https://arxiv.org/abs/2006.13477)" +methodology,data augmentation,"Data augmentation involves techniques used for increasing the amount of data, based on different modifications, to expand the amount of examples in the original dataset. Data augmentation not only helps to grow the dataset but it also increases the diversity of the dataset. When training machine learning models, data augmentation acts as a regularizer and helps to avoid overfitting. + +Data augmentation techniques have been found useful in domains like NLP and computer vision. In computer vision, transformations like cropping, flipping, and rotation are used. In NLP, data augmentation techniques can include swapping, deletion, random insertion, among others. + +Further readings: + +- [A Survey of Data Augmentation Approaches for NLP](https://paperswithcode.com/paper/a-survey-of-data-augmentation-approaches-for) +- [A survey on Image Data Augmentation for Deep Learning](https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0197-0) + +( Image credit: [Albumentations](https://github.com/albumentations-team/albumentations) )" +methodology,entity embeddings,Entity Embeddings is a technique for applying deep learning to tabular data. It involves representing the categorical data of an information systems entity with multiple dimensions. +methodology,automated feature engineering,Automated feature engineering improves upon the traditional approach to feature engineering by automatically extracting useful and meaningful features from a set of related data tables with a framework that can be applied to any problem. +methodology,nonparametric deep clustering,Deep Nonparametric clustering are methods which utilize deep clustering when the number of clusters is not known apriorly and needs to be inferred. +methodology,unsupervised anomaly detection,"The objective of **Unsupervised Anomaly Detection** is to detect previously unseen rare objects or events without any prior knowledge about these. The only information available is that the percentage of anomalies in the dataset is small, usually less than 1%. Since anomalies are rare and unknown to the user at training time, anomaly detection in most cases boils down to the problem of modelling the normal data distribution and defining a measurement in this space in order to classify samples as anomalous or normal. In high-dimensional data such as images, distances in the original space quickly lose descriptive power (curse of dimensionality) and a mapping to some more suitable space is required. + + +Source: [Unsupervised Learning of Anomaly Detection from Contaminated Image Data using Simultaneous Encoder Training ](https://arxiv.org/abs/1905.11034)" +methodology,partial label learning, +methodology,point processes, +methodology,feature importance, +methodology,tensor networks, +methodology,additive models, +methodology,eeg,"Electroencephalogram (EEG) is a method of recording brain activity using electrophysiological indexes. When the brain is active, a large number of postsynaptic potentials generated synchronously by neurons are formed after summation. It records the changes of electric waves during brain activity and is the overall reflection of the electrophysiological activities of brain nerve cells on the surface of cerebral cortex or scalp. Brain waves originate from the postsynaptic potential of the apical dendrites of pyramidal cells. The formation of synchronous rhythm of EEG is also related to the activity of nonspecific projection system of cortex and thalamus. EEG is the basic theoretical research of brain science. EEG monitoring is widely used in its clinical application." +methodology,eeg denoising, +methodology,generalized few shot learning, +methodology,detection of dependencies, +methodology,classification,Algorithms trying to solve the general task of classification. +methodology,network pruning,"**Network Pruning** is a popular approach to reduce a heavy network to obtain a light-weight form by removing redundancy in the heavy network. In this approach, a complex over-parameterized network is first trained, then pruned based on come criterions, and finally fine-tuned to achieve comparable performance with reduced parameters. + + +Source: [Ensemble Knowledge Distillation for Learning Improved and Efficient Networks ](https://arxiv.org/abs/1909.08097)" +methodology,continuously indexed domain adaptation,"Continuously indexed domain adaptation adapts across continuously indexed domains, e.g., across patients of different ages, where 'age' is a continuous notion." +methodology,multi label learning, +methodology,electrocardiography ecg, +methodology,multi task learning,"Multi-task learning aims to learn multiple different tasks simultaneously while maximizing +performance on one or all of the tasks. + +( Image credit: [Cross-stitch Networks for Multi-task Learning](https://arxiv.org/pdf/1604.03539v1.pdf) )" +methodology,variable selection, +methodology,hierarchical reinforcement learning, +methodology,neural network compression, +methodology,unsupervised pre training,Pre-training a neural network using unsupervised (self-supervised) auxiliary tasks on unlabeled data. +methodology,sentence embeddings for biomedical texts, +methodology,quantum circuit equivalence checking,Equivalence Checking of Quantum Circuits +methodology,one class classifier, +methodology,anomaly detection and segmentation,3D-Only Anomaly Detection and Segmentation +methodology,depth rgb anomaly segmentation,Depth + RGB Anomaly Segmentation +methodology,depth anomaly segmentation,Anomaly Segmentation using depth information only +methodology,federated unsupervised learning,Federated unsupervised learning trains models from decentralized data that have no labels. +methodology,l2 regularization, +methodology,feature engineering,"Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns. + +The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset." +methodology,unsupervised mnist,Accuracy on MNIST when training without any labels +methodology,similarity explanation, +methodology,ticket search, +methodology,dictionary learning,"**Dictionary Learning** is an important problem in multiple areas, ranging from computational neuroscience, machine learning, to computer vision and image processing. The general goal is to find a good basis for given data. More formally, in the Dictionary Learning problem, also known as sparse coding, we are given samples of a random vector $y\in\mathbb{R}^n$, of the form $y=Ax$ where $A$ is some unknown matrix in $\mathbb{R}^{n×m}$, called dictionary, and $x$ is sampled from an unknown distribution over sparse vectors. The goal is to approximately recover the dictionary $A$. + + +Source: [Polynomial-time tensor decompositions with sum-of-squares ](https://arxiv.org/abs/1610.01980)" +methodology,anomaly segmentation,Anomaly Segmentation using 3D information only +miscellaneous,computer security, +miscellaneous,air pollution prediction, +miscellaneous,session based recommendations,Recommendation based on a sequence of events. e.g. next item prediction +miscellaneous,product recommendation, +miscellaneous,pde surrogate modeling, +miscellaneous,insurance prediction, +miscellaneous,learning theory,Learning theory +miscellaneous,operator learning,Learn an operator between infinite dimensional Hilbert spaces or Banach spaces +miscellaneous,natural questions, +miscellaneous,marketing, +miscellaneous,professional medicine, +miscellaneous,high school chemistry, +miscellaneous,professional psychology, +miscellaneous,advertising, +miscellaneous,international law, +miscellaneous,college chemistry, +miscellaneous,table detection,Image credit:[Table Detection in the Wild: A Novel Diverse Table Detection Dataset and Method](https://paperswithcode.com/paper/table-detection-in-the-wild-a-novel-diverse) +miscellaneous,gravitational wave detection, +miscellaneous,security studies, +miscellaneous,virology, +miscellaneous,building change detection for remote sensing, +miscellaneous,sports understanding, +miscellaneous,multi armed bandits,"Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off. + +( Image credit: [Microsoft Research](http://research.microsoft.com/en-us/projects/bandits/) )" +miscellaneous,machine learning, +miscellaneous,fever 2 way, +miscellaneous,high school world history, +miscellaneous,radio interferometry, +miscellaneous,high school statistics, +miscellaneous,pseudo label,A lightweight but very power technique for semi supervised learning +miscellaneous,cross modal retrieval,"**Cross-Modal Retrieval** is used for implementing a retrieval task across different modalities. such as image-text, video-text, and audio-text Cross-Modal Retrieval. The main challenge of Cross-Modal Retrieval is the modality gap and the key solution of Cross-Modal Retrieval is to generate new representations from different modalities in the shared subspace, such that new generated features can be applied in the computation of distance metrics, such as cosine distance and Euclidean distance. + + +Source: [Deep Triplet Neural Networks with Cluster-CCA for Audio-Visual Cross-modal Retrieval ](https://arxiv.org/abs/1908.03737)" +miscellaneous,social media popularity prediction,"Social Media Popularity Prediction (SMPP) aims to predict the future popularity (e.g., clicks, views, likes, etc.) of online posts automatically via plenty of social media data from public platforms. It is a crucial problem for social media learning and forecasting and one of the most challenging problems in the field. With the ever-changing user interests and public attention on social media platforms, how to predict popularity accurately becomes more challenging than before. This task is valuable to content providers, marketers, or consumers in a range of real-world applications, including multimedia advertising, recommendation system, or trend analysis." +miscellaneous,gender bias detection, +miscellaneous,hypothesis testing,"In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, each independently obtained from a different given population. The purpose of the test is to determine whether the difference between these two populations is statistically significant. The statistics used in two-sample tests can be used to solve many machine learning problems, such as domain adaptation, covariate shift and generative adversarial networks." +miscellaneous,jsoniq query execution,"Execute JSONiq query, typically on semi-structured JSON data" +miscellaneous,causal inference,"Causal inference is the task of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. + +( Image credit: [Recovery of non-linear cause-effect relationships from linearly mixed neuroimaging data](https://arxiv.org/pdf/1605.00391v2.pdf) )" +miscellaneous,modeling local geometric structure, +miscellaneous,protein interface prediction,"How and where proteins interface with one another can ultimately impact the proteins' functions along with a range of other biological processes. As such, precise computational methods for protein interface prediction (PIP) come highly sought after as they could yield significant advances in drug discovery and design as well as protein function analysis." +miscellaneous,moral scenarios, +miscellaneous,misconceptions,"Measures whether a model can discern popular misconceptions from the truth. + +Example: + +``` + input: The daddy longlegs spider is the most venomous spider in the world. + choice: T + choice: F + answer: F + + input: Karl Benz is correctly credited with the invention of the first modern automobile. + choice: T + choice: F + answer: T +``` + +Source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/misconceptions)" +miscellaneous,fever 3 way, +miscellaneous,multi lingual image text classification, +miscellaneous,imputation,Substituting missing data with values according to some criteria. +miscellaneous,classification of variable stars, +miscellaneous,weather forecasting,"**Weather Forecasting** is the prediction of future weather conditions such as precipitation, temperature, pressure and wind. + + +Source: [MetNet: A Neural Weather Model for Precipitation Forecasting ](https://arxiv.org/abs/2003.12140)" +miscellaneous,business taxonomy construction, +miscellaneous,cyber attack detection, +miscellaneous,similarities abstraction, +miscellaneous,high school government and politics, +miscellaneous,movie recommendation,"Evaluates the ability of language models to propose relevant movie recommendations with collaborative filtering data. + +Source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/movie_recommendation)" +miscellaneous,age estimation,"Age Estimation is the task of estimating the age of a person from an image some other kind of data. + +( Image credit: [BridgeNet](https://arxiv.org/pdf/1904.03358v1.pdf) )" +miscellaneous,human grasp contact prediction,Predict contact between object and hand (human or robot). +miscellaneous,interpretability techniques for deep learning, +miscellaneous,sociology, +miscellaneous,service composition,"Let T be the task that the service composition needs to accomplish. The task T can be granulated to T 1 , T 2 , T 3 , T 4 , … , T n . i.e. T = +{T 1 , T 2 , T 3 , T 4 , … , T n } . For each task T i , a set of service S i = S i 1 , S i 2 , S i 3 , … , S i m is discovered during the service discovery process such that all +services in a set S i perform the same function and have the same input and output parameters (See Figure 2). S 1 = {S 11 , S 12 , S 13 , … , S 1m } , S 2 = +{S 21 , S 22 , S 23 , … , S 2m } , S 3 = {S 31 , S 32 , S 33 , … , S 3m } , … , S n = {S n 1 , S n 2 , S n 3 , … , S n m } +We need to select one service from each set S i in order to compose the big service such that the overall QoS attributes of the big service +are optimal. The total number of the possible distinct service composition is n m . Let k be the the number of QoS attributes. Then the total num- +ber of comparisons required are kn m . We need at least kn m comparisons to find whether the solution is optimal, thus making the problem as +NP-Hard." +miscellaneous,deception detection in videos, +miscellaneous,anatomy, +miscellaneous,recommendation systems,"The recommendation systems task is to produce a list of recommendations for a user. The most common methods used in recommender systems are factor models (Koren et al., 2009; Weimer et al., 2007; Hidasi & Tikk, 2012) and neighborhood methods (Sarwar et al., 2001; Koren, 2008). +Factor models work by decomposing the sparse user-item interactions matrix to a set of d dimensional vectors one for each item and user in the dataset. Factor models are hard to apply in session-based recommendations due to the absence of a user profile. On the other hand, neighborhood methods, which rely on computing similarities between items (or users) are based on co-occurrences of items in sessions (or user profiles). Neighborhood methods have been used extensively in session-based recommendations. + +( Image credit: [CuMF_SGD](https://arxiv.org/pdf/1610.05838v3.pdf) )" +miscellaneous,classification with costly features,The task is to classify the dataset with costly features with different budget settings. The final metric is the normalized area under the cost-accuracy curve. +miscellaneous,data visualization, +miscellaneous,college computer science, +miscellaneous,time offset calibration, +miscellaneous,change detection for remote sensing images, +miscellaneous,denoising of radar micro doppler signatures, +miscellaneous,prediction intervals,"A prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis." +miscellaneous,conceptual physics, +miscellaneous,sentence ambiguity,"Asks models to identify the truth or falsehood of purposely ambiguous sentences. + +Source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/sentence_ambiguity)" +miscellaneous,load forecasting, +miscellaneous,sequential quantile estimation, +miscellaneous,hindu knowledge, +miscellaneous,crowd flows prediction, +miscellaneous,open set learning,"Traditional supervised learning aims to train a classifier in the closed-set world, where training and test samples share the same label space. Open set learning (OSL) is a more challenging and realistic setting, where there exist test samples from the classes that are unseen during training. Open set recognition (OSR) is the sub-task of detecting test samples which do not come from the training." +miscellaneous,jurisprudence, +miscellaneous,oceanic eddy classification, +miscellaneous,parameter prediction, +miscellaneous,model discovery,discovering PDEs from spatiotemporal data +miscellaneous,logic grid puzzle, +miscellaneous,community question answering,"Community question answering is the task of answering questions on a Q&A forum or board, such as Stack Overflow or Quora." +miscellaneous,multilingual text classification, +miscellaneous,high school european history, +miscellaneous,high school biology, +miscellaneous,sequential correlation estimation, +miscellaneous,online ranker evaluation, +miscellaneous,auto debugging, +miscellaneous,making hiring decisions, +miscellaneous,gpr,Gaussian Process Regression +miscellaneous,non linear elasticity, +miscellaneous,fine grained urban flow inference,Fine-grained urban flow inference (FUFI) aims to infer the fine-grained urban flow map from the coarse-grained one. +miscellaneous,multi target regression, +miscellaneous,malware classification,"**Malware Classification** is the process of assigning a malware sample to a specific malware family. Malware within a family shares similar properties that can be used to create signatures for detection and classification. Signatures can be categorized as static or dynamic based on how they are extracted. A static signature can be based on a byte-code sequence, binary assembly instruction, or an imported Dynamic Link Library (DLL). Dynamic signatures can be based on file system activities, terminal commands, network communications, or function and system call sequences. + + +Source: [Behavioral Malware Classification using Convolutional Recurrent Neural Networks ](https://arxiv.org/abs/1811.07842)" +miscellaneous,human aging, +miscellaneous,world religions, +miscellaneous,ecommerce, +miscellaneous,behavioral malware detection, +miscellaneous,photometric redshift estimation, +miscellaneous,fairness, +miscellaneous,high school microeconomics, +miscellaneous,ethics, +miscellaneous,image text classification, +miscellaneous,total energy, +miscellaneous,ancient tex restoration,"Image credit: [Restoring and attributing ancient texts using deep neural networks +](https://paperswithcode.com/paper/restoring-and-attributing-ancient-texts-using)" +miscellaneous,table extraction,Table extraction involves detecting and recognizing a table's logical structure and content from its unstructured presentation within a document +miscellaneous,robust design, +miscellaneous,formation energy,"On the QM9 dataset the numbers reported in the table are the mean absolute error in eV on the target variable U0 divided by U0's chemical accuracy, which is equal to 0.043." +miscellaneous,vulnerability detection, +miscellaneous,public relations, +miscellaneous,deception detection, +miscellaneous,home activity monitoring, +miscellaneous,seismic imaging, +miscellaneous,high school computer science, +miscellaneous,product categorization, +miscellaneous,problem decomposition, +miscellaneous,high school geography, +miscellaneous,philosophy, +miscellaneous,intrusion detection,"**Intrusion Detection** is the process of dynamically monitoring events occurring in a computer system or network, analyzing them for signs of possible incidents and often interdicting the unauthorized access. This is typically accomplished by automatically collecting information from a variety of systems and network sources, and then analyzing the information for possible security problems. + + +Source: [Machine Learning Techniques for Intrusion Detection ](https://arxiv.org/abs/1312.2177)" +miscellaneous,fraud detection,"**Fraud Detection** is a vital topic that applies to many industries including the financial sectors, banking, government agencies, insurance, and law enforcement, and more. Fraud endeavors have detected a radical rise in current years, creating this topic more critical than ever. Despite struggles on the part of the troubled organizations, hundreds of millions of dollars are wasted to fraud each year. Because nearly a few samples confirm fraud in a vast community, locating these can be complex. Data mining and statistics help to predict and immediately distinguish fraud and take immediate action to minimize costs. + + +Source: [Applying support vector data description for fraud detection ](https://arxiv.org/abs/2006.00618)" +miscellaneous,survival analysis,"**Survival Analysis** is a branch of statistics focused on the study of time-to-event data, usually called survival times. This type of data appears in a wide range of applications such as failure times in mechanical systems, death times of patients in a clinical trial or duration of unemployment in a population. One of the main objectives of Survival Analysis is the estimation of the so-called survival function and the hazard function. If a random variable has density function $f$ and cumulative distribution function $F$, then its survival function $S$ is $1-F$, and its hazard $λ$ is $f/S$. + + +Source: [Gaussian Processes for Survival Analysis ](https://arxiv.org/abs/1611.00817) + +Image: [Kvamme et al.](https://arxiv.org/pdf/1910.06724v1.pdf)" +miscellaneous,image outpainting,"Predicting the visual context of an image beyond its boundary. + +Image credit: [NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis](https://paperswithcode.com/paper/nuwa-infinity-autoregressive-over?from=n35)" +miscellaneous,vector quantization k means problem,Given a data set $X$ of d-dimensional numeric vectors and a number $k$ find a codebook $C$ of $k$ d-dimensional vectors such that the sum of square distances of each $x \in X$ to the respective nearest $c \in C$ is as small as possible. This is also known as the k-means problem and is known to be NP-hard. +miscellaneous,human organs senses multiple choice, +miscellaneous,us foreign policy, +miscellaneous,next basket recommendation, +miscellaneous,high school psychology, +miscellaneous,college biology, +miscellaneous,counterfactual explanation,"Returns a contrastive argument that permits to achieve the desired class, e.g., “to obtain this loan, you need XXX of annual +revenue instead of the current YYY”" +miscellaneous,numerical integration,Numerical integration is the task to calculate the numerical value of a definite integral or the numerical solution of differential equations. +miscellaneous,stress strain relation,"Data-driven techniques for finding stress-strain relation in non-linearly elastic bodies. + +( Image credit: [Data-driven Computing in Elasticity +via Chebyshev Approximation](https://arxiv.org/pdf/1904.10434.pdf) )" +miscellaneous,multi modal,"The problem of retrieving images from a database based on a multi-modal (image- text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications." +miscellaneous,fault detection, +miscellaneous,bin packing,"As a classic NP-hard problem, the bin packing problem (1D-BPP) seeks for an assignment of a collection of items with various weights to bins. The optimal assignment houses all the items with the fewest bins such that the total weight of items in a bin is below the bin’s capacity. In its 3D version (3D-BPP), an item has a 3D “weight” corresponding +to its length, width and height." +miscellaneous,econometrics, +miscellaneous,remote sensing, +miscellaneous,triviaqa, +miscellaneous,geophysics, +miscellaneous,android malware detection, +miscellaneous,twitter bot detection,"Academic studies estimate that up to 15% of Twitter users are automated bot accounts [1]. The prevalence of Twitter bots coupled with the ability of some bots to give seemingly human responses has enabled these non-human accounts to garner widespread influence. Hence, detecting non-human Twitter users or automated bot accounts using machine learning techniques has become an area of interest to researchers in the last few years. + +[1] https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15587" +miscellaneous,detect ground reflections,This task helps in detecting the significant ground reflections at mm-wave bands. The harvested ground reflections can help in overcoming transient blockages at mm-wave bands +miscellaneous,trajectory prediction,"**Trajectory Prediction** is the problem of predicting the short-term (1-3 seconds) and long-term (3-5 seconds) spatial coordinates of various road-agents such as cars, buses, pedestrians, rickshaws, and animals, etc. These road-agents have different dynamic behaviors that may correspond to aggressive or conservative driving styles. + + +Source: [Forecasting Trajectory and Behavior of Road-Agents Using Spectral Clustering in Graph-LSTMs ](https://arxiv.org/abs/1912.01118)" +miscellaneous,x ray diffraction,"Diffraction of X-ray patterns and images, with common applications for materials and images." +miscellaneous,contextual anomaly detection,"The objective of Unsupervised Anomaly Detection is to detect previously unseen rare objects or events. Contextual Anomaly Detection is formulated such that the data contains two types of attributes, behavioral and contextual attributes. Behavioral attributes are attributes that relate directly to the process of interest whereas contextual attributes relate to exogenous but highly affecting factors in relation to the process. Generally the behavioral attributes are conditional on the contextual attributes. +Source: [Unsupervised Contextual Anomaly Detection using Joint Deep Variational Generative Models](https://arxiv.org/pdf/1904.00548.pdf)" +miscellaneous,sequential distribution function estimation, +miscellaneous,artificial life, +miscellaneous,seismic interpretation, +miscellaneous,automatic cell counting, +miscellaneous,recipe generation, +miscellaneous,eeg emotion recognition,Emotion Recognition using EEG signals +miscellaneous,learning to rank,"Learning to rank is the application of machine learning to build ranking models. Some common use cases for ranking models are information retrieval (e.g., web search) and news feeds application (think Twitter, Facebook, Instagram)." +miscellaneous,network congestion control, +miscellaneous,moral permissibility, +miscellaneous,professional law, +miscellaneous,lake ice detection, +miscellaneous,seismic inversion, +miscellaneous,food recommendation, +miscellaneous,science technology, +miscellaneous,synthetic data generation,The generation of tabular data by any means possible. +miscellaneous,traffic classification,"**Traffic Classification** is a task of categorizing traffic flows into application-aware classes such as chats, streaming, VoIP, etc. Classification can be used for several purposes including policy enforcement and control or QoS management. + + +Source: [Classification of Traffic Using Neural Networks by Rejecting: a Novel Approach in Classifying VPN Traffic ](https://arxiv.org/abs/2001.03665)" +miscellaneous,physical simulations, +miscellaneous,network intrusion detection,**Network intrusion detection** is the task of monitoring network traffic to and from all devices on a network in order to detect computer attacks. +miscellaneous,business ethics, +miscellaneous,general knowledge,"This task aims to evaluate the ability of a model to answer general-knowledge questions. + +Source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/general_knowledge)" +miscellaneous,protein structure prediction,Image credit: [FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours](https://arxiv.org/pdf/2203.00854v1.pdf) +miscellaneous,nutrition, +miscellaneous,crop yield prediction, +miscellaneous,continual learning,"**Continual Learning** (also known as **Incremental Learning**, **Life-long Learning**) is a concept to learn a model for a large number of tasks sequentially without forgetting knowledge obtained from the preceding tasks, where the data in the old tasks are not available anymore during training new ones. +If not mentioned, the benchmarks here are **Task-CL**, where task-id is provided on validation. + +Source: +[Continual Learning by Asymmetric Loss Approximation with Single-Side Overestimation](https://arxiv.org/abs/1908.02984) +[Three scenarios for continual learning](https://arxiv.org/abs/1904.07734) +[Lifelong Machine Learning](https://books.google.ca/books/about/Lifelong_Machine_Learning.html?id=JQ5pDwAAQBAJ&redir_esc=y) +[Continual lifelong learning with neural networks: A review](https://www.sciencedirect.com/science/article/pii/S0893608019300231)" +miscellaneous,logical fallacies, +miscellaneous,epistemic reasoning, +miscellaneous,cantilever beam, +miscellaneous,multi modal classification, +miscellaneous,air quality inference, +miscellaneous,cross modal information retrieval, +miscellaneous,semeval 2022 task 4 1 binary pcl detection, +miscellaneous,mathematical proofs, +miscellaneous,high school physics, +miscellaneous,click through rate prediction,"Click-through rate prediction is the task of predicting the likelihood that something on a website (such as an advertisement) will be clicked. + +( Image credit: [Deep Spatio-Temporal Neural Networks for Click-Through Rate Prediction](https://arxiv.org/pdf/1906.03776v2.pdf) )" +miscellaneous,link quality estimation, +miscellaneous,cyber attack investigation, +miscellaneous,college medicine, +miscellaneous,mobile security, +miscellaneous,data summarization,"**Data Summarization** is a central problem in the area of machine learning, where we want to compute a small summary of the data. + + +Source: [How to Solve Fair k-Center in Massive Data Models ](https://arxiv.org/abs/2002.07682)" +miscellaneous,remote sensing image classification, +miscellaneous,knowledge tracing,"**Knowledge Tracing** is the task of modelling student knowledge over time so that we can accurately predict how students will perform on future interactions. Improvement on this task means that resources can be suggested to students based on their individual needs, and content which is predicted to be too easy or too hard can be skipped or delayed. + + +Source: [Deep Knowledge Tracing ](https://arxiv.org/abs/1506.05908)" +miscellaneous,misinformation, +miscellaneous,physics mc, +miscellaneous,college physics, +miscellaneous,clinical knowledge, +miscellaneous,dead reckoning prediction, +miscellaneous,human detection of deepfakes,"The task of detecting deepfake stimuli, as given to human participants in a statistical study. Methodologies should ideally include a-priori power analysis (e.g. using the GPower software) to calculate the sample size of human participants that would be sufficient to detect the presence of a main effect of a specified effect size." +miscellaneous,image inpainting,"**Image Inpainting** is a task of reconstructing missing regions in an image. It is an important problem in computer vision and an essential functionality in many imaging and graphics applications, e.g. object removal, image restoration, manipulation, re-targeting, compositing, and image-based rendering. + +Source: [High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling ](https://arxiv.org/abs/2005.11742) + +Image source: [High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling](https://arxiv.org/pdf/2005.11742.pdf)" +miscellaneous,cryptanalysis, +miscellaneous,high school macroeconomics, +miscellaneous,science question answering,Image credit: [Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering](https://paperswithcode.com/paper/learn-to-explain-multimodal-reasoning-via) +miscellaneous,multimodal intent recognition,"Intent recognition on multimodal content. + +Image source: [MIntRec: A New Dataset for Multimodal Intent Recognition](https://paperswithcode.com/dataset/mintrec)" +miscellaneous,variational monte carlo,Variational methods for quantum physics +miscellaneous,medical genetics, +miscellaneous,moral disputes, +miscellaneous,unsupervised contextual anomaly detection,"The objective of Unsupervised Anomaly Detection is to detect previously unseen rare objects or events. Unsupervised Contextual Anomaly Detection is formulated such that the data contains two types of attributes, behavioral and contextual attributes with no pre-existing information which observations are anomalous. Behavioral attributes are attributes that relate directly to the process of interest whereas contextual attributes relate to exogenous but highly affecting factors in relation to the process. Generally the behavioral attributes are conditional on the contextual attributes. +Source: [Unsupervised Contextual Anomaly Detection using Joint Deep Variational Generative Models](https://arxiv.org/pdf/1904.00548.pdf)" +miscellaneous,electrical engineering, +miscellaneous,seismic detection,"When recording seismic ground motion in multiple sites using independent recording stations one needs to recognize the presence of the same parts of seismic waves arriving at these stations. This problem is known in seismology as seismic phase picking or, more generally, seismic detection." +miscellaneous,text to video generation,This task refers to video generation based on a given sentence or sequence of words. +miscellaneous,deep clustering, +miscellaneous,known unknowns,"Language models have a tendency to generate text containing false statements that are often referred to as ""Hallucinations."" The primary purpose of this task is to test for this failure case by probing whether a model can correctly identify that the answer to a question is unknown. A common failure mode would be to prefer a prediction of false on unknown truth over a prediction that the answer is unknown. + +Source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/known_unknowns)" +miscellaneous,counterfactual inference, +miscellaneous,global facts, +miscellaneous,online review rating, +miscellaneous,collaborative filtering, +miscellaneous,multi modal person identification, +miscellaneous,segmentation of remote sensing imagery, +miscellaneous,classifier calibration,Confidence calibration – the problem of predicting probability estimates representative of the true correctness likelihood – is important for classification models in many applications. The two common calibration metrics are Expected Calibration Error (ECE) and Maximum Calibration Error (MCE). +miscellaneous,knowledge aware recommendation, +miscellaneous,crop classification, +miscellaneous,brain decoding,"**Motor Brain Decoding** is fundamental task for building motor brain computer interfaces (BCI). + +Progress in predicting finger movements based on brain activity allows us to restore motor functions and improve rehabilitation process of patients." +miscellaneous,smart grid prediction, +miscellaneous,neural network security, +miscellaneous,human sexuality, +miscellaneous,molecular property prediction,Molecular property prediction is the task of predicting the properties of a molecule from its structure. +miscellaneous,detecting adverts, +miscellaneous,malware detection,"**Malware Detection** is a significant part of endpoint security including workstations, servers, cloud instances, and mobile devices. Malware Detection is used to detect and identify malicious activities caused by malware. With the increase in the variety of malware activities on CMS based websites such as [malicious malware redirects on WordPress site](https://secure.wphackedhelp.com/blog/wordpress-malware-redirect-hack-cleanup/) (Aka, WordPress Malware Redirect Hack) where the site redirects to spam, being the most widespread, the need for automatic detection and classifier amplifies as well. The signature-based Malware Detection system is commonly used for existing malware that has a signature but it is not suitable for unknown malware or zero-day malware + + +Source: [The Threat of Adversarial Attacks on Machine Learning in Network Security - A Survey ](https://arxiv.org/abs/1911.02621)" +miscellaneous,non intrusive load monitoring, +miscellaneous,pulsar prediction, +miscellaneous,management, +miscellaneous,high school us history, +miscellaneous,neural network compression, +miscellaneous,sequential recommendation, +miscellaneous,miscellaneous, +miscellaneous,imbalanced classification,learning classifier from class-imbalanced data +miscellaneous,dqn replay dataset, +miscellaneous,prehistory, +miscellaneous,behavioral malware classification, +miscellaneous,automated theorem proving,"The goal of **Automated Theorem Proving** is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems. + + +Source: [Learning to Prove Theorems by Learning to Generate Theorems ](https://arxiv.org/abs/2002.07019)" +miscellaneous,astronomy, +miscellaneous,crime prediction, +miscellaneous,facies classification, +miscellaneous,the semantic segmentation of remote sensing, +miscellaneous,self organized clustering,Clustering with Self-Organized Maps +miscellaneous,extracting buildings in remote sensing images, +miscellaneous,outdoor positioning,Outdoor Positioning (e.g. GPS) +music,music emotion recognition, +music,music auto tagging, +music,semeval 2022 task 4 1 binary pcl detection, +music,singer identification, +music,hand gesture recognition, +music,music texture transfer,"Texture is the collective temporal homogeneity of acoustic +events." +music,music genre recognition,"Recognizing the genre (e.g. rock, pop, jazz, etc.) of a piece of music." +music,detection of instrumentals musical tracks, +music,music source separation,"Music source separation is the task of decomposing music into its constitutive components, e. g., yielding separated stems for the vocals, bass, and drums. + +( Image credit: [SigSep](https://github.com/sigsep ) )" +music,vocal technique classification, +music,music modeling,"( Image credit: [R-Transformer](https://arxiv.org/pdf/1907.05572v1.pdf) )" +music,audio super resolution,AUDIO SUPER-RESOLUTION or speech bandwidth extension (Upsampling Ratio = 2) +music,cover song identification,"**Cover Song Identification** is the task of identifying an alternative version of a previous musical piece, even though it may differ substantially in timbre, tempo, structure, and even fundamental aspects relating to the harmony and melody of the song. The term “cover” is so wide that it ranges from acoustic renditions of a previous song, to Jimi Hendrix’ famous (and radical) reinterpretation of Bob Dylan’s “All Along the Watchtower”, to Rage Against the Machine essentially rewriting Bob Dylan’s “Maggie’s Farm”. Beyond its value for computational musicology and for enhancing music recommendation, Cover Song Identification is of interest because of its potential for benchmarking other music similarity and retrieval algorithms. Chord analysis, melody extraction and music similarity are all strongly connected to Cover Song Identification - another field of music analysis where AI has been applied. + + +Source: [Artificial Musical Intelligence: A Survey ](https://arxiv.org/abs/2006.10553)" +music,music generation,Music Generation is a task of automatically generating music. +music,recognizing seven different dastgahs of, +music,music classification, +music,piano music modeling, +music,drum transcription, +music,music transcription,"Music transcription is the task of converting an acoustic musical signal into some form of music notation. + +( Image credit: [ISMIR 2015 Tutorial - Automatic Music Transcription](http://c4dm.eecs.qmul.ac.uk/ismir15-amt-tutorial/AMT_tutorial_ISMIR_2015.pdf) )" +music,melody extraction, +music,music information retrieval, +natural-language-processing,cell entity annotation,"**Cell Entity Annotation** (CEA) is the task of annotating cells in a table with an entity from a knowledge base and is a subtask of [Table Annotation](https://paperswithcode.com/task/table-annotation). CEA problem labels are entities from knowledge bases such as DBpedia or WikiData. It usually is considered as a multi-class classification problem. + +CEA can also be referred to in different works as the problem of entity linking, as it links a cell in a table to an entity." +natural-language-processing,ucca parsing,"UCCA (Abend and Rappoport, 2013) is a semantic representation whose main design principles are ease of annotation, cross-linguistic applicability, and a modular architecture. UCCA represents the semantics of linguistic utterances as directed acyclic graphs (DAGs), where terminal (childless) nodes correspond to the text tokens, and non-terminal nodes to semantic units that participate in some super-ordinate relation. Edges are labeled, indicating the role of a child in the relation the parent represents. UCCA’s foundational layer mostly covers predicate-argument structure, semantic heads and inter-Scene relations. UCCA distinguishes primary edges, corresponding to explicit relations, from remote edges that allow for a unit to participate in several super-ordinate relations. Primary edges form a tree in each layer, whereas remote edges enable reentrancy, forming a DAG. + +Description from [NLP Progress](http://nlpprogress.com/english/semantic_parsing.html)" +natural-language-processing,riddle sense, +natural-language-processing,selection bias, +natural-language-processing,fine grained opinion analysis,"Fine-Grained Opinion Analysis aims to: (i) detect opinion expressions that convey attitudes such as sentiments, agreements, beliefs, or intentions, (ii) measure their intensity, (iii) identify their holders i.e. entities that express an attitude, (iv) identify their targets i.e. entities or propositions at which the attitude is directed, and (v) classify their target-dependent attitude. + +( Image credit: [SRL4ORL](https://arxiv.org/pdf/1711.00768v3.pdf) )" +natural-language-processing,semantic textual similarity,"Semantic textual similarity deals with determining how similar two pieces of texts are. +This can take the form of assigning a score from 1 to 5. Related tasks are paraphrase or duplicate identification. + +Image source: [Learning Semantic Textual Similarity from Conversations](https://arxiv.org/pdf/1804.07754.pdf)" +natural-language-processing,chinese spell checking,Chinese Spell Checking (CSC) aims to detect and correct erroneous characters for user-generated text in Chinese language. +natural-language-processing,phrase relatedness, +natural-language-processing,text to video search, +natural-language-processing,humor detection,Humor detection is the task of identifying comical or amusing elements. +natural-language-processing,pretrained language models, +natural-language-processing,understanding fables, +natural-language-processing,gre reading comprehension, +natural-language-processing,word alignment,"**Word Alignment** is the task of finding the correspondence between source and target words in a pair of sentences that are translations of each other. + + +Source: [Neural Network-based Word Alignment through Score Aggregation ](https://arxiv.org/abs/1606.09560)" +natural-language-processing,active learning,"**Active Learning** is a paradigm in supervised machine learning which uses fewer training examples to achieve better optimization by iteratively training a predictor, and using the predictor in each iteration to choose the training examples which will increase its chances of finding better configurations and at the same time improving the accuracy of the prediction model + + +Source: [Polystore++: Accelerated Polystore System for Heterogeneous Workloads ](https://arxiv.org/abs/1905.10336)" +natural-language-processing,text anonymization, +natural-language-processing,binary relation extraction, +natural-language-processing,meme classification,Meme classification refers to the task of classifying internet memes. +natural-language-processing,sign language production,"Sign Language Production (SLP) is the automatically translation from spoken language sentences into sign language sequences. Whilst Sign language Translation translates from sign to text, SLP is the opposite task from text to sign." +natural-language-processing,reverse dictionary, +natural-language-processing,intent recognition, +natural-language-processing,offline handwritten chinese character,Handwritten Chinese characters recognition is the task of detecting and interpreting the components of Chinese characters (i.e. radicals and two-dimensional structures). +natural-language-processing,lambada, +natural-language-processing,hellaswag, +natural-language-processing,cross lingual document classification,"Cross-lingual document classification refers to the task of using data and models available for one language for which ample such resources are available (e.g., English) to solve classification tasks in another, commonly low-resource, language." +natural-language-processing,dialogue understanding, +natural-language-processing,irony identification,"This task asks a model to identify whether a given sentence(s) is/are ironic or not. + +Source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/irony_identification)" +natural-language-processing,grounded language learning,Acquire the meaning of language in situated environments. +natural-language-processing,transliteration,"**Transliteration** is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language. + +For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech. + + +Source: [Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources ](https://arxiv.org/abs/1810.03184)" +natural-language-processing,answer generation, +natural-language-processing,morpheme segmentaiton,Succesful systems segment a given word or sentence into a sequence of morphemes. +natural-language-processing,sentence embeddings, +natural-language-processing,session search, +natural-language-processing,extractive document summarization,"Given a document, selecting a subset of the words or sentences which best represents a summary of the document." +natural-language-processing,chinese word segmentation,Chinese word segmentation is the task of splitting Chinese text (i.e. a sequence of Chinese characters) into words (Source: www.nlpprogress.com). +natural-language-processing,multimodal text prediction, +natural-language-processing,graph to sequence,Mapping an input graph to a sequence of vectors. +natural-language-processing,machine translation,"Machine translation is the task of translating a sentence in a source language to a different target language. + +Approaches for machine translation can range from rule-based to statistical to neural-based. More recently, encoder-decoder attention-based architectures like BERT have attained major improvements in machine translation. + +One of the most popular datasets used to benchmark machine translation systems is the WMT family of datasets. Some of the most commonly used evaluation metrics for machine translation systems include BLEU, METEOR, NIST, and others. + +( Image credit: [Google seq2seq](https://github.com/google/seq2seq) )" +natural-language-processing,paraphrase generation,"Paraphrase Generation involves transforming a natural language sentence to a new sentence, that has the same semantic meaning but a different syntactic or lexical surface form." +natural-language-processing,few shot text classification, +natural-language-processing,constituency parsing,"Constituency parsing aims to extract a constituency-based parse tree from a sentence that +represents its syntactic structure according to a [phrase structure grammar](https://en.wikipedia.org/wiki/Phrase_structure_grammar). + +Example: + + Sentence (S) + | + +-------------+------------+ + | | + Noun (N) Verb Phrase (VP) + | | + John +-------+--------+ + | | + Verb (V) Noun (N) + | | + sees Bill + +[Recent approaches](https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf) +convert the parse tree into a sequence following a depth-first traversal in order to +be able to apply sequence-to-sequence models to it. The linearized version of the +above parse tree looks as follows: (S (N) (VP V N))." +natural-language-processing,kg to text,"Knowledge-graph-to-text (KG-to-text) generation aims to generate high-quality texts which are consistent with input graphs. + +Description from: [JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs](https://arxiv.org/pdf/2106.10502v1.pdf)" +natural-language-processing,learning with noisy labels,"Learning with noisy labels means When we say ""noisy labels,"" we mean that an adversary has intentionally messed up the labels, which would have come from a ""clean"" distribution otherwise. This setting can also be used to cast learning from only positive and unlabeled data." +natural-language-processing,complaint comment classification, +natural-language-processing,intent discovery,"Given a set of labelled and unlabelled utterances, the idea is to identify existing (known) intents and potential (new intents) intents. This method can be utilised in conversational system setting." +natural-language-processing,prosody prediction,"Predicting prosodic prominence from text. This is a 2-way classification task, assigning each word in a sentence a label 1 (prominent) or 0 (non-prominent). + +( Image credit: [Helsinki Prosody Corpus](https://github.com/Helsinki-NLP/prosody) )" +natural-language-processing,dialogue evaluation, +natural-language-processing,multiple choice qa,"A multiple-choice question (MCQ) is composed of two parts: a stem that identifies the question or problem, and a set of alternatives or possible answers that contain a key that is the best answer to the question, and a number of distractors that are plausible but incorrect answers to the question. + +In a k-way MCQA task, a model is provided with a question q, a set of candidate options O = {O1, . . . , Ok}, and a supporting context for each option C = {C1, . . . , Ck}. The model needs to predict the correct answer option that is best supported by the given contexts." +natural-language-processing,chatbot,"**Chatbot** or conversational AI is a language model designed and implemented to have conversations with humans. + + +Source: [Open Data Chatbot ](https://arxiv.org/abs/1909.03653) + +[Image source](https://arxiv.org/pdf/2006.16779v3.pdf)" +natural-language-processing,tweet reply sentiment analysis,"To predict the predominant sentiment among (potential) first-order replies to a given tweet, in a Message-level Polarity Classification paradigm." +natural-language-processing,fake news detection,Fake news detection is the task of detecting forms of news consisting of deliberate disinformation or hoaxes spread via traditional news media (print and broadcast) or online social media (Source: Adapted from Wikipedia). +natural-language-processing,bias detection,"Bias detection is the task of detecting and measuring racism, sexism and otherwise discriminatory behavior in a model (Source: https://stereoset.mit.edu/)" +natural-language-processing,open relation modeling, +natural-language-processing,pronunciation dictionary creation,Create a pronunciation dictionary +natural-language-processing,hope speech detection for malayalam,Detecting Hopespeech in the Malayalam language +natural-language-processing,emotion cause pair extraction, +natural-language-processing,sequential pattern mining,"**Sequential Pattern Mining** is the process that discovers relevant patterns between data examples where the values are delivered in a sequence. + + +Source: [Big Data Analytics for Large Scale Wireless Networks: Challenges and Opportunities ](https://arxiv.org/abs/1909.08069)" +natural-language-processing,multlingual neural machine translation, +natural-language-processing,temporal relation extraction,"Temporal relation extraction systems aim to identify and classify the temporal relation between a pair of entities provided in a text. For instance, in the sentence ""Bob sent a message to Alice while she was leaving her birthday party."" one can infer that the actions ""sent"" and ""leaving"" entails a temporal relation that can be described as ""simultaneous""." +natural-language-processing,word translation, +natural-language-processing,hate speech detection,"Hate speech detection is the task of detecting if communication such as text, audio, and so on contains hatred and or encourages violence towards a person or a group of people. This is usually based on prejudice against 'protected characteristics' such as their ethnicity, gender, sexual orientation, religion, age et al. Some example benchmarks are ETHOS and HateXplain. Models can be evaluated with metrics like the F-score or F-measure." +natural-language-processing,vietnamese sentiment analysis, +natural-language-processing,sentence embedding, +natural-language-processing,propaganda span identification, +natural-language-processing,metric type identification, +natural-language-processing,turkish text diacritization,Addition of diacritics for undiacritized Turkish Wikipedia texts. +natural-language-processing,concept to text generation,"Generating natural language text from a conceptualized representation, such as an ontology." +natural-language-processing,multilingual machine comprehension,"Multilingual Machine Comprehension (MMC) is a Question-Answering (QA) sub-task that involves quoting the answer for a question from a given snippet, where the question and the snippet can be in different languages. Results on an extended version of the recently released XQuAD dataset, which we propose to use as the evaluation benchmark for future research." +natural-language-processing,context query reformulation, +natural-language-processing,twitter event detection,"Detection of worldwide events from categories like Sports, Politics, Entertainment, Science & Technology, etc. by analyzing Twitter Tweets." +natural-language-processing,temporal tagging,"Identification of the extent of a temporal expression (timex) in a text. The temporal expressions can be explicit (e.g. ""October, 27 "") or implicit (e.g. ""last month""). + +Other names: Timex Extraction; Timex Identification; Timex Detection" +natural-language-processing,temporal casual qa, +natural-language-processing,grammatical error correction,"Grammatical Error Correction (GEC) is the task of correcting different kinds of errors in text such as spelling, punctuation, grammatical, and word choice errors. + +GEC is typically formulated as a sentence correction task. A GEC system takes a potentially erroneous sentence as input and is expected to transform it to its corrected version. See the example given below: + +| Input (Erroneous) | Output (Corrected) | +| ------------------------- | ---------------------- | +|She see Tom is catched by policeman in park at last night. | She saw Tom caught by a policeman in the park last night.|" +natural-language-processing,cross lingual natural language inference,"Using data and models available for one language for which ample such resources are available (e.g., English) to solve a natural language inference task in another, commonly more low-resource, language." +natural-language-processing,task completion dialogue policy learning, +natural-language-processing,privacy preserving deep learning,"The goal of privacy-preserving (deep) learning is to train a model while preserving privacy of the training dataset. Typically, it is understood that the trained model should be privacy-preserving (e.g., due to the training algorithm being differentially private)." +natural-language-processing,sonnet generation,Generating a poetry in the form of a sonnet. +natural-language-processing,mathematical question answering,Building systems that automatically answer mathematical questions. +natural-language-processing,lexical analysis,Lexical analysis is the process of converting a sequence of characters into a sequence of tokens (strings with an assigned and thus identified meaning). (Source: Adapted from Wikipedia) +natural-language-processing,pesona dialogue in story,Building persona dialogue in a story +natural-language-processing,word attribute transfer,"Changing a word's attribute, such as its gender." +natural-language-processing,diachronic word embeddings, +natural-language-processing,logical reasoning question ansering,"Introduced by ReClor (ICLR 2020), logical reasoning is to evaluate the logical reasoning ability of models for question answering." +natural-language-processing,ccg supertagging,"Combinatory Categorical Grammar (CCG; [Steedman, 2000](http://www.citeulike.org/group/14833/article/8971002)) is a +highly lexicalized formalism. The standard parsing model of [Clark and Curran (2007)](https://www.mitpressjournals.org/doi/abs/10.1162/coli.2007.33.4.493) +uses over 400 lexical categories (or _supertags_), compared to about 50 part-of-speech tags for typical parsers. + +Example: + +| Vinken | , | 61 | years | old | +| --- | ---| --- | --- | --- | +| N| , | N/N | N | (S[adj]\ NP)\ NP |" +natural-language-processing,pcl detection, +natural-language-processing,link prediction on dh kgs, +natural-language-processing,hurtful sentence completion,Measure hurtful sentence completions in language models (HONEST) +natural-language-processing,gender bias detection, +natural-language-processing,literature mining,The task where the publication texts are used to mine knowledge using NLP +natural-language-processing,question answer generation, +natural-language-processing,joint entity and relation extraction,Scores reported from systems which jointly extract entities and relations. +natural-language-processing,vqa, +natural-language-processing,ruin names, +natural-language-processing,word sense induction,"Word sense induction (WSI) is widely known as the “unsupervised version” of WSD. The problem states as: Given a target word (e.g., “cold”) and a collection of sentences (e.g., “I caught a cold”, “The weather is cold”) that use the word, cluster the sentences according to their different senses/meanings. We do not need to know the sense/meaning of each cluster, but sentences inside a cluster should have used the target words with the same sense. + +Description from [NLP Progress](http://nlpprogress.com/english/word_sense_disambiguation.html)" +natural-language-processing,table annotation,"**Table annotation** is the task of annotating a table with terms/concepts from knowledge graph or database schema. Table annotation is typically broken down into the following five subtasks: + +1. Cell Entity Annotation ([CEA](https://paperswithcode.com/task/cell-entity-annotation)) +2. Column Type Annotation ([CTA](https://paperswithcode.com/task/column-type-annotation)) +3. Column Property Annotation ([CPA](https://paperswithcode.com/task/columns-property-annotation)) +4. [Table Type Detection](https://paperswithcode.com/task/table-type-detection) +5. [Row Annotation](https://paperswithcode.com/task/row-annotation) + +The [SemTab](http://www.cs.ox.ac.uk/isg/challenges/sem-tab/) challenge is closely related to the Table Annotation problem. It is a yearly challenge which focuses on the first three tasks of table annotation and its purpose is to benchmark different table annotation systems." +natural-language-processing,commonsense causal reasoning,"""Commonsense Causal Reasoning is the process of capturing and understanding the causal dependencies amongst events and actions."" Luo, Zhiyi, et al. ""Commonsense causal reasoning between short texts."" Fifteenth International Conference on the Principles of Knowledge Representation and Reasoning. 2016." +natural-language-processing,opinion mining,"Identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral (Source: Oxford Languages) + +Image Source: [Deep learning for sentiment analysis: A survey](https://onlinelibrary.wiley.com/doi/abs/10.1002/widm.1253)" +natural-language-processing,cross lingual information retrieval,"Cross-Lingual Information Retrieval (CLIR) is a retrieval task in which search queries and candidate documents are written in different languages. CLIR can be very useful in some scenarios. For example, a reporter may want to search foreign language news to obtain different perspectives for her story; an inventor may explore the patents in another country to understand prior art." +natural-language-processing,abstractive sentence summarization,Generating a summary of a given sentence. +natural-language-processing,unsupervised kg to text generation, +natural-language-processing,timedial, +natural-language-processing,crowdsourced text aggregation,"One of the most important parts of processing responses from crowd workers is **aggregation**: given several conflicting opinions, a method should extract the truth. This problem is also known as *truth-inference* in crowdsourcing. Text aggregation problem is dedicated to extracting the correct information from crowd workers' responses for a crowdsourcing task where the output is a *text*: audio transcription, translation, character recognition, etc." +natural-language-processing,fantasy reasoning, +natural-language-processing,table search, +natural-language-processing,cloze test,The cloze task refers to infilling individual words. +natural-language-processing,weakly supervised data denoising, +natural-language-processing,topic coverage,"A prevalent use case of topic models is that of topic discovery. +However, most of the topic model evaluation methods rely on abstract metrics such as perplexity or topic coherence. The topic coverage approach is to measure the models' performance by matching model-generated topics to a fixed set of reference topics - topics discovered by humans and represented in a machine-readable format. This way, the models are evaluated in the context of their use, by essentially simulating topic modeling in a fixed setting defined by a text collection and a set of reference topics. +Reference topics represent a ground truth that can be used to evaluate both topic models and other measures of model performance. This coverage approach enables large-scale automatic evaluation of existing and future topic models." +natural-language-processing,long range modeling,"A new task for testing the long-sequence modeling capabilities and efficiency of language models. + +Image credit: [SCROLLS: Standardized CompaRison Over Long Language Sequences](https://arxiv.org/pdf/2201.03533v1.pdf)" +natural-language-processing,satire detection,Satire detection consists in detecting when a text is written in a satirical tone and its content shouldn't be interpreted literally. +natural-language-processing,semi supervised formality style transfer,Semi-Supervised Formality Style Transfer +natural-language-processing,spanish text diacritization,Addition of diacritics for undiacritized Spanish Wikipedia texts. +natural-language-processing,zero shot cross lingual transfer, +natural-language-processing,sentiment dependency learning, +natural-language-processing,protein folding, +natural-language-processing,text matching,Matching a target text to a source text based on their meaning. +natural-language-processing,named entity recognition in vietnamese, +natural-language-processing,document summarization,"Automatic **Document Summarization** is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document. + + +Source: [HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization ](https://arxiv.org/abs/1905.06566)" +natural-language-processing,document classification,"**Document Classification** is a procedure of assigning one or more labels to a document from a predetermined set of labels. + + +Source: [Long-length Legal Document Classification ](https://arxiv.org/abs/1912.06905)" +natural-language-processing,face selection,A task where an agent should select at most two sentences from the paper as argumentative facts. +natural-language-processing,prepositional phrase attachment, +natural-language-processing,language identification,Language identification is the task of determining the language of a text. +natural-language-processing,lemmatization,"**Lemmatization** is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context. + + +Source: [Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks ](https://arxiv.org/abs/1902.00972)" +natural-language-processing,chinese zero pronoun resolution,Chinese zero pronoun resolution refers to the task of resolving a so-called zero segment of a Chinese text that is not written but supplies information for interpreting the text. +natural-language-processing,reader aware summarization,Using reader comments to improve summarization performance. +natural-language-processing,stance detection,"Stance detection is the extraction of a subject's reaction to a claim made by a primary actor. It is a core part of a set of approaches to fake news assessment. + +Example: + +* Source: ""Apples are the most delicious fruit in existence"" +* Reply: ""Obviously not, because that is a reuben from Katz's"" +* Stance: deny" +natural-language-processing,xlm r,XLM-R +natural-language-processing,emergent communications on relations,Emergent communications in the context of relations. +natural-language-processing,unsupervised part of speech tagging,Marking up a word in a text (corpus) as corresponding to a particular part of speech based on both its definition and its context by using an untagged corpus for training and producing the tagset by induction (Source: Wikipedia). +natural-language-processing,fact selection,A task where an agent should select at most two sentences from the paper as argumentative facts. +natural-language-processing,keyword extraction,Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a document (Source: Wikipedia). +natural-language-processing,implicit discourse relation classification,"Parsing a text into a set of discourse relations between two adjacent or non-adjacent discourse units in the absence of explicit connectives, such as 'but' or 'however', and classifying those relations. (Source: Adapted from https://www.cs.brandeis.edu/~clp/conll15st/intro.html)" +natural-language-processing,chunking,"Chunking, also known as shallow parsing, identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases. + +Example: + +| Vinken | , | 61 | years | old | +| --- | ---| --- | --- | --- | +| B-NLP| I-NP | I-NP | I-NP | I-NP |" +natural-language-processing,continual relation extraction,"Compared with traditional relation extraction, CRE aims to help the model learn new relations while maintaining accurate classification of old ones." +natural-language-processing,continual named entity recognition,Continual learning for named entity recogntion +natural-language-processing,logical reasoning reading comprehension,"Logical reasoning reading comprehension is a task proposed by the paper ReClor (ICLR 2020), which is to evaluate the logical reasoning ability of machine reading comprehension models. ReClor is the first dataset for logical reasoning reading comprehension." +natural-language-processing,cross lingual transfer,"Cross-lingual transfer refers to transfer learning using data and models available for one language for which ample such resources are available (e.g., English) to solve tasks in another, commonly more low-resource, language." +natural-language-processing,text variation,Generate variations of the input text +natural-language-processing,word sense disambiguation,"The task of Word Sense Disambiguation (WSD) consists of associating words in context with their most suitable entry in a pre-defined sense inventory. The de-facto sense inventory for English in WSD is [WordNet](https://wordnet.princeton.edu). +For example, given the word “mouse” and the following sentence: + +“A mouse consists of an object held in one's hand, with one or more buttons.” + +we would assign “mouse” with its electronic device sense ([the 4th sense in the WordNet sense inventory](http://wordnetweb.princeton.edu/perl/webwn?c=8&sub=Change&o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&i=-1&h=000000&s=mouse))." +natural-language-processing,multi modal dialogue generation,Image credit: [OpenViDial](https://github.com/ShannonAI/OpenViDial) +natural-language-processing,language modelling,"Language modeling is the task of predicting the next word or character in a document. This technique can be used to train language models that can further be applied to a wide range of natural language tasks like text generation, text classification, and question answering. + +The common types of language modeling techniques involve: + +- N-gram Language Models +- Neural Langauge Models + +A model's language modeling capability is measured using cross-entropy and perplexity. Some datasets to evaluate language modeling are WikiText-103, One Billion Word, Text8, C4, among others. + +One of the most recent popular benchmarks to evaluate language modeling capabilities is called SuperGLUE. + +Some popular and notable state-of-the-art language models, include: + +- [GPT-3](/method/gpt-3) +- Megatron-LM +- [BERT](/method/bert) + +Check below for all state-of-the-art models. + +Here are some additional readings to go deeper on the task: + +- [Language Modeling](https://lena-voita.github.io/nlp_course/language_modeling.html) - Lena Voita + +( Image credit: [Exploring the Limits of Language Modeling](https://arxiv.org/pdf/1602.02410v2.pdf) )" +natural-language-processing,aspect based sentiment analysis,"Aspect-based sentiment analysis is the task of identifying fine-grained opinion polarity towards a specific aspect associated with a given target. + +( Image credit: [Utilizing BERT for Aspect-Based Sentiment Analysis +via Constructing Auxiliary Sentence](https://arxiv.org/pdf/1903.09588v1.pdf) )" +natural-language-processing,text clustering,Grouping a set of texts in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). (Source: Adapted from Wikipedia) +natural-language-processing,snarks, +natural-language-processing,prompt engineering, +natural-language-processing,speculation scope resolution,Identifiy the scope of a speculation cue that indicates uncertainty in a given text. +natural-language-processing,scientific results extraction,"Scientific results extraction is the task of extracting relevant result information (e.g., in the case of Machine learning performance results: task, dataset, metric name, metric value) from the scientific literature." +natural-language-processing,natural language transduction,Converting one sequence into another +natural-language-processing,taxonomy learning,"Taxonomy learning is the task of hierarchically classifying concepts in an automatic manner from text corpora. The process of building taxonomies is usually divided into two main steps: (1) extracting hypernyms for concepts, which may constitute a field of research in itself (see Hypernym Discovery below) and (2) refining the structure into a taxonomy. + +Description from [NLP Progress](http://nlpprogress.com/english/taxonomy_learning.html)" +natural-language-processing,cause effect relation classification,"Classifying pairs of entities (e.g., events) into causal or non-causal or predicting Cause and Effect in a causal relation" +natural-language-processing,vietnamese word segmentation, +natural-language-processing,text effects transfer,"Text effects transfer refers to the task of transferring typography styles (e.g., color, texture) to an input image of a text element." +natural-language-processing,entity typing,"**Entity Typing** is an important task in text analysis. Assigning types (e.g., person, location, organization) to mentions of entities in documents enables effective structured analysis of unstructured text corpora. The extracted type information can be used in a wide range of ways (e.g., serving as primitives for information extraction and knowledge base (KB) completion, and assisting question answering). Traditional Entity Typing systems focus on a small set of coarse types (typically fewer than 10). Recent studies work on a much larger set of fine-grained types which form a tree-structured hierarchy (e.g., actor as a subtype of artist, and artist is a subtype of person). + + +Source: [Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding ](https://arxiv.org/abs/1602.05307) + +Image Credit: [Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding ](https://arxiv.org/abs/1602.05307)" +natural-language-processing,handwritten chinese text recognition,"Handwritten Chinese text recognition is the task of interpreting handwritten Chinese input, e.g., from images of documents or scans." +natural-language-processing,piqa, +natural-language-processing,drug drug interaction extraction,"Automatic extraction of Drug-drug interaction (DDI) information from the biomedical literature. + +( Image credit: [Using Drug Descriptions and Molecular Structures for Drug-Drug Interaction Extraction from Literature](https://watermark.silverchair.com/btaa907.pdf?token=AQECAHi208BE49Ooan9kkhW_Ercy7Dm3ZL_9Cf3qfKAc485ysgAAAuAwggLcBgkqhkiG9w0BBwagggLNMIICyQIBADCCAsIGCSqGSIb3DQEHATAeBglghkgBZQMEAS4wEQQMb9BaAetiYVbvf0_9AgEQgIICk0-IUccCHnqrDDtbnyBSTvPnrWXc4F2vfEMKkxzGRA3-WbynJbw0HptyHyjawXr3c4TeC9ZhIta1szzhc5t1JOhdh8rOo5CTtGk_JPfq14TkMkIISDCvdsVL76fvCn-3KhtAJhHPcyYTDqMaGSb3ltXJLfR6PXoCXnINprcZ3pO6ktuNLA8KF0_CHTITnGNcX1G1n6ZERyGTPwGjodH9Qq9UcYzJCx4N1KKgxOMAj5oxIoLPQi97oNJ3eCoYBoKDuSW-Zza_ULcBDXkkTvt3A460O32vfRAqnnPYlVSuvEiASc3lX8O6Qb28GXc99EIzwDQJEO4znl7haAGOxubuUm5Of9p22xRfc7KSuCh41cJxW_31bsnTuf8Sb2z6O6bFo3aNlxF3SrLfNTHfJH6Tst0WvaaRJ1gQ3JxcFwu--hsZMF9bW3_yzFwP1ZnVIPVtE0bqm3QZt-_nHaE4o9KgCCNY4t70h7U7yD9ZrZMvFnNieSvgL35t--l5PerE9uZgP6v9LUKUyAI1TDHHMameO5794Z7WII8v_MKG0jmUIbr564ENnyKJYunNowu3EbvUMxBv3DSUkepU1kP66tJjSflal3LlRd8LUbj4m2Tin3LteS-09Gje4pyMaeO5RywX_tSPSFGK8QGaZqpgbIU2y23YnCMAnIDOSS3_L3LQRGu50YK5OwEvMY7azpEMbR3kaaHWu_cmEN2Vm61UdG6uLql5pMc9zKfGAQ3E1VXGuhdEalRtuLbtBZ9UFj-vYePfrRGWRXjkg-11SGgKqRaJcLK32yuvhG11KqacCoY169A4G29_GfRC3rbDsnqyMRZ9ESe3FB1NnWwZ4HPNA7ju3yJ7cfZSfYgdygT6oTpBQjaweH5U) )" +natural-language-processing,semeval 2022 task 4 2 multi label pcl, +natural-language-processing,sentence compression,"As per one of the sponsors, namely, Sen. Claro M. Recto, the supposed reading of the late national hero’s works instills an intrinsic love for the nation, the Philippines and its people. Moreover, Recto asserted that reading the masterpieces written down by Jose Rizal will further increase youth patriotism and develop their sense of Filipino identity ando add concreteness to his statement, he even fought vehemently for the deliberate study of the country’s hero’s life, works, and writings to be required of all students in all public and private schools, colleges, and institutions in order to achieve the goal. Senator Jose P. Laurel, on the other hand, also shared the same fiery passion to stress out the essence of reading the late hero’s writings as a way of resonating how patriotism in the past could also be of importance in the present day situation. He even introduced SB 438 on the 17th of April 1956, with the title Act to Make Noli Me Tangere and El Filibusterismo Compulsory Reading Matters in All Public and Private Schools, Colleges, and Universities, and for Other Purposes. They both emphasized that this is a fundamental cog and gear for all students of the country to be informed of Rizal’s patriotism and how he fought for the country through the help of his beloved quill and ink.original sentence." +natural-language-processing,problem solving deliberation, +natural-language-processing,dialect identification,Dialectal Arabic Identification +natural-language-processing,news generation,Generation of larger segments of text with consistent topic and evolving story. +natural-language-processing,emotion cause pair extraction 1,Emotion-Cause Pair Extraction (ECPE) aims to extract the potential pairs of emotions and corresponding causes in a document. +natural-language-processing,multi labeled relation extraction, +natural-language-processing,few shot relation classification,"**Few-Shot Relation Classification** is a particular relation classification task under minimum annotated data, where a model is required to classify a new incoming query instance given only few support instances (e.g., 1 or 5) during testing. + + +Source: [MICK: A Meta-Learning Framework for Few-shot Relation Classification with Little Training Data ](https://arxiv.org/abs/2004.14164)" +natural-language-processing,hope speech detection,"Detecting speech associated with positive, uplifting, +promise, potential, support, reassurance, suggestions, or inspiration." +natural-language-processing,speech to text translation,"Translate audio signals of speech in one language into text in a foreign language, either in an end-to-end or cascade manner." +natural-language-processing,intent detection,"**Intent Detection** is a vital component of any task-oriented conversational system. In order to understand the user’s current goal, the system must leverage its intent detector to classify the user’s utterance (provided in varied natural language) into one of several predefined classes, that is, intents. However, the performance of intent detection has been hindered by the data scarcity issue, as it is non-trivial to collect sufficient examples for new intents. How to effectively identify user intents in few-shot learning has become popular. + + +Source: [Few-shot Intent Detection Datasets, Baselines and Results ](https://github.com/jianguoz/Few-Shot-Intent-Detection) + + +Source: [Are Pretrained Transformers Robust in Intent Classification? A Missing Ingredient in Evaluation of Out-of-Scope Intent Detection ](https://github.com/jianguoz/Few-Shot-Intent-Detection) + + +Source: [Efficient Intent Detection with Dual Sentence Encoders](https://github.com/PolyAI-LDN/polyai-models)" +natural-language-processing,machine reading comprehension,"**Machine Reading Comprehension** is one of the key problems in Natural Language Understanding, where the task is to read and comprehend a given text passage, and then answer questions based on it. + +Source: [Making Neural Machine Reading Comprehension Faster ](https://arxiv.org/abs/1904.00796)" +natural-language-processing,lexical normalization,"Lexical normalization is the task of translating/transforming a non standard text to a standard register. + +Example: + +``` +new pix comming tomoroe +new pictures coming tomorrow +``` + +Datasets usually consists of tweets, since these naturally contain a fair amount of +these phenomena. + +For lexical normalization, only replacements on the word-level are annotated. +Some corpora include annotation for 1-N and N-1 replacements. However, word +insertion/deletion and reordering is not part of the task." +natural-language-processing,multimodal machine translation,"Multimodal machine translation is the task of doing machine translation with multiple data sources - for example, translating ""a bird is flying over water"" + an image of a bird over water to German text. + +( Image credit: [Findings of the Third Shared Task on Multimodal Machine Translation](https://www.aclweb.org/anthology/W18-6402.pdf) )" +natural-language-processing,entity disambiguation,"**Entity Disambiguation** is the task of linking mentions of ambiguous entities to their referent entities in a knowledge base such as Wikipedia. + + +Source: [Leveraging Deep Neural Networks and Knowledge Graphs for Entity Disambiguation ](https://arxiv.org/abs/1504.07678)" +natural-language-processing,aspect term extraction and sentiment,Extracting the aspect terms as well as the corresponding sentiment polarities simultaneously. +natural-language-processing,entity alignment,"**Entity Alignment** is the task of finding entities in two knowledge bases that refer to the same real-world object. It plays a vital role in automatically integrating multiple knowledge bases. +Note: results that have incorporated machine translated entity names (introduced in the RDGCN paper) or pre-alignment name embeddings are considered to have used **extra training labels** (both are marked with ""Extra Training Data"" in the leaderboard) and are **not adhere to a comparable setting** with others that have followed the original setting of the benchmark. + +Source: [Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding ](https://arxiv.org/abs/1708.05045) + +The task of entity alignment is related to the task of [entity resolution](https://paperswithcode.com/task/entity-resolution) which focuses on matching structured entity descriptions in different contexts." +natural-language-processing,hypernym discovery,"Given a corpus and a target term (hyponym), the task of hypernym discovery consists of extracting a set of its most appropriate hypernyms from the corpus. For example, for the input word “dog”, some valid hypernyms would be “canine”, “mammal” or “animal”." +natural-language-processing,vietnamese text diacritization,Addition of diacritics for undiacritized Vietnamese Wikipedia texts. +natural-language-processing,language acquisition,Language acquisition refers to tasks related to the learning of a second language. +natural-language-processing,bridging anaphora resolution, +natural-language-processing,context specific spam detection, +natural-language-processing,latent aspect detection, +natural-language-processing,answer selection,"**Answer Selection** is the task of identifying the correct answer to a question from a pool of candidate answers. This task can be formulated as a classification or a ranking problem. + + +Source: [Learning Analogy-Preserving Sentence Embeddings for Answer Selection ](https://arxiv.org/abs/1910.05315)" +natural-language-processing,image sentence alignment,Predict the alignment (score) between an image and a sentence. +natural-language-processing,medical named entity recognition, +natural-language-processing,lexical complexity prediction,Predicting the complexity of a word/multi-word expression in a sentence. +natural-language-processing,misogynistic aggression identification,"Develop a binary classifier for classifying the text as ‘gendered’ or ‘non-gendered’. For this, the TRAC-2 dataset of 5,000 annotated data from social media each in Bangla (in both Roman and Bangla script), Hindi (in both Roman and Devanagari script) and English for training and validation is to be used." +natural-language-processing,column type annotation,"**Column type annotation** (CTA) refers to the task of predicting the semantic type of a table column and is a subtask of [Table Annotation](https://paperswithcode.com/task/table-annotation). The labels that are usually used in a CTA problem are semantic types from vocabularies like DBpedia, Schema.org or WikiData. Some examples are: *Book*, *Country*, *LocalBusiness* etc. + +CTA can be either treated as a multi-class classification problem where a column is annotated by only one semantic type or as multi-label classification problem where a column can be annotated using multiple semantic types." +natural-language-processing,pico,"The proliferation of healthcare data has contributed to the widespread usage of the PICO paradigm for creating specific clinical questions from RCT. + +PICO is a mnemonic that stands for: + +Population/Problem: Addresses the characteristics of populations involved and the specific characteristics of the disease or disorder. +Intervention: Addresses the primary intervention (including treatments, procedures, or diagnostic tests) along with any risk factors. +Comparison: Compares the efficacy of any new interventions with the primary intervention. +Outcome: Measures the results of the intervention, including improvements or side effects. +PICO is an essential tool that aids evidence-based practitioners in creating precise clinical questions and searchable keywords to address those issues. It calls for a high level of technical competence and medical domain knowledge, but it’s also frequently very time-consuming. + +Automatically identifying PICO elements from this large sea of data can be made easier with the aid of machine learning (ML) and natural language processing (NLP). This facilitates the development of precise research questions by evidence-based practitioners more quickly and precisely. + +Empirical studies have shown that the use of PICO frames improves the specificity and conceptual clarity of clinical problems, elicits more information during pre-search reference interviews, leads to more complex search strategies, and yields more precise search results." +natural-language-processing,morphological analysis,"**Morphological Analysis** is a central task in language processing that can take a word as input and detect the various morphological entities in the word and provide a morphological representation of it. + + +Source: [Towards Finite-State Morphology of Kurdish ](https://arxiv.org/abs/2005.10652)" +natural-language-processing,bangla spelling error correction,Bangla spell checker which improves the quality of suggestions for misspelled words. +natural-language-processing,story generation,"Story generation is the task of automatically generating a coherent narrative, often from a set of premises or a brief summary." +natural-language-processing,hidden aspect detection, +natural-language-processing,toponym resolution,The goal is to find a mapping from a toponym (a location mention) in the text to a spatial footprint. +natural-language-processing,open domain dialog, +natural-language-processing,formality style transfer,Formality Style Transfer +natural-language-processing,predicate detection,Detecting predicates in sentences. Semantic frames are defined with respect to predicates. This task is a prerequisite to semantic role labeling. +natural-language-processing,document dating,"Document Dating is the problem of automatically predicting the date of a document based on its content. Date of a document, also referred to as the Document Creation Time (DCT), is at the core of many important tasks, such as, information retrieval, temporal reasoning, text summarization, event detection, and analysis of historical text, among others. + +For example, in the following document, the correct creation year is 1999. This can be inferred by the presence of terms 1995 and Four years after. + +Swiss adopted that form of taxation in 1995. The concession was approved by the govt last September. Four years after, the IOC…. + +Description from [NLP Progress](http://nlpprogress.com/english/temporal_processing.html)" +natural-language-processing,aspect category opinion sentiment quadruple,Aspect-Category-Opinion-Sentiment (ACOS) Quadruple Extraction is the task with the goal to extract all aspect-category-opinion-sentiment quadruples in a review sentence. ( and provide full support for aspect-level sentiment analysis with implicit aspects and opinions if possible ) +natural-language-processing,movie dialog same or different, +natural-language-processing,page stream segmentation,page stream segmentation (PSS) is the task to automatically separate a stream of scanned images into coherent multi-page documents +natural-language-processing,morphological tagging,"Morphological tagging is the task of assigning labels to a sequence of tokens that describe them morphologically. As compared to Part-of-speech tagging, morphological tagging also considers morphological features, such as case, gender or the tense of verbs." +natural-language-processing,propaganda technique identification, +natural-language-processing,review generation, +natural-language-processing,conversation disentanglement,"Automatic disentanglement could be used to provide more interpretable results when searching over chat logs, and to help users understand what is happening when they join a channel. + +Source: [Kummerfeld et al.](https://arxiv.org/pdf/1810.11118v2.pdf)" +natural-language-processing,empirical judgments,"Drawing inspiration from Immanuel Kant, this task measures a model’s ability to distinguish between two kinds of empirical judgments: judgments that assert a correlative relation between empirical events, and judgments that assert a causal relation. + +Source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/empirical_judgments)" +natural-language-processing,amr parsing,"Each AMR is a single rooted, directed graph. AMRs include PropBank semantic roles, within-sentence coreference, named entities and types, modality, negation, questions, quantities, and so on. [See](https://amr.isi.edu/index.html)." +natural-language-processing,open intent discovery,Open intent discovery aims to leverage limited prior knowledge of known intents to find fine-grained known and open intent-wise clusters. +natural-language-processing,recognizing emotion cause in conversations,"Given an utterance U, labeled with emotion E, the task is to extract the causal spans S from the conversational history H (including utterance U) that sufficiently represent the causes of emotion E." +natural-language-processing,dialogue safety prediction,Determine the safety of a given dialogue context. +natural-language-processing,multimodal lexical translation,"Translate a given word in a source language to a word in the target language, given the source sentence and one or more images illustrating the word." +natural-language-processing,connective detection, +natural-language-processing,pretrained multilingual language models, +natural-language-processing,parallel corpus mining,Mining a corpus of bilingual sentence pairs that are translations of each other. +natural-language-processing,morphological inflection,"**Morphological Inflection** is the task of generating a target (inflected form) word from a source word (base form), given a morphological attribute, e.g. number, tense, and person etc. It is useful for alleviating data sparsity issues in translating morphologically rich languages. The transformation from a base form to an inflected form usually includes concatenating the base form with a prefix or a suffix and substituting some characters. For example, the inflected form of a Finnish stem eläkeikä (retirement age) is eläkeiittä when the case is abessive and the number is plural. + + +Source: [Tackling Sequence to Sequence Mapping Problems with Neural Networks ](https://arxiv.org/abs/1810.10802)" +natural-language-processing,decipherment, +natural-language-processing,lay summarization,"Summarizing a technical or scientific document in simple, non-technical language that is comprehensible to a lay person (non-expert)." +natural-language-processing,vietnamese aspect based sentiment analysis,"UIT-ViSFD: A Vietnamese Smartphone Feedback Dataset for Aspect-Based Sentiment Analysis + + +In this paper, we present a process of building a social listening system based on aspect-based sentiment analysis in Vietnamese from creating a dataset to building a real application. Firstly, we create UIT-ViSFD, a Vietnamese Smartphone Feedback Dataset as a new benchmark corpus built based on strict annotation schemes for evaluating aspect-based sentiment analysis, consisting of 11,122 human-annotated comments for mobile e-commerce, which is freely available for research purposes. We also present a proposed approach based on the Bi-LSTM architecture with the fastText word embeddings for the Vietnamese aspect-based sentiment task. Our experiments show that our approach achieves the best performances with the F1-score of 84.48% for the aspect task and 63.06% for the sentiment task, which performs several conventional machine learning and deep learning systems. Last but not least, we build SA2SL, a social listening system based on the best performance model on our dataset, which will inspire more social listening systems in the future. Dataset download: https://www.facebook.com/ViDataset + +Paper: Phan, Luong Luc, Phuc Huynh Pham, Kim Thi-Thanh Nguyen, Tham Thi Nguyen, Sieu Khai Huynh, Luan Thanh Nguyen, Tin Van Huynh, and Kiet Van Nguyen. ""SA2SL: From Aspect-Based Sentiment Analysis to Social Listening System for Business Intelligence."" arXiv preprint arXiv:2105.15079 (2021)." +natural-language-processing,arabic text diacritization,Addition of diacritics for undiacritized arabic texts for words disambiguation. +natural-language-processing,semantic composition,Understanding the meaning of text by composing the meanings of the individual words in the text (Source: https://arxiv.org/pdf/1405.7908.pdf) +natural-language-processing,task oriented dialogue systems,Achieving a pre-defined task through a dialog. +natural-language-processing,document ranking,"Sort documents according to some criterion so that the ""best"" results appear early in the result list displayed to the user (Source: Wikipedia)." +natural-language-processing,extractive tags summarization,"The goal of Extractive Tags Summarization (ETS) task is to shorten the list of tags corresponding to a digital image while keeping the representativity; i.e., is to extract important tags from the context lying in an image and its corresponding tags." +natural-language-processing,community question answering,"Community question answering is the task of answering questions on a Q&A forum or board, such as Stack Overflow or Quora." +natural-language-processing,chinese,Chinese language processing is the task of applying natural language processing to the Chinese language. +natural-language-processing,argument mining,"**Argument Mining** is a field of corpus-based discourse analysis that involves the automatic identification of argumentative structures in text. + + +Source: [AMPERSAND: Argument Mining for PERSuAsive oNline Discussions ](https://arxiv.org/abs/2004.14677)" +natural-language-processing,cognate prediction, +natural-language-processing,unsupervised sentence summarization,Generating a summary of a given sentence without supervision. +natural-language-processing,relation classification,"**Relation Classification** is the task of identifying the semantic relation holding between two nominal entities in text. + + +Source: [Structure Regularized Neural Network for Entity Relation Classification for Chinese Literature Text ](https://arxiv.org/abs/1803.05662)" +natural-language-processing,named entity recognition ner,"Named entity recognition (NER) is the task of tagging entities in text with their corresponding type. +Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities. +O is used for non-entity tokens. + +Example: + +| Mark | Watney | visited | Mars | +| --- | ---| --- | --- | +| B-PER | I-PER | O | B-LOC | + +( Image credit: [Zalando](https://research.zalando.com/welcome/mission/research-projects/flair-nlp/) )" +natural-language-processing,4 ary relation extraction, +natural-language-processing,formal fallacies syllogisms negation, +natural-language-processing,semantic dependency parsing,Identify semantic relationships between words in a text using a graph representation. +natural-language-processing,text infilling,"**Text Infilling** is the task of predicting missing spans of text which are consistent with the preceding and subsequent text. Text Infilling is a generalization of the cloze task—cloze historically refers to infilling individual words. + + +Source: [Enabling Language Models to Fill in the Blanks ](https://arxiv.org/abs/2005.05339)" +natural-language-processing,definition modelling, +natural-language-processing,passage re ranking,Passage re-ranking is the task of scoring and re-ranking a collection of retrieved documents based on an input query. +natural-language-processing,slot filling,"The goal of **Slot Filling** is to identify from a running dialog different slots, which correspond to different parameters of the user’s query. For instance, when a user queries for nearby restaurants, key slots for location and preferred food are required for a dialog system to retrieve the appropriate information. Thus, the main challenge in the slot-filling task is to extract the target entity. + + +Source: [Real-time On-Demand Crowd-powered Entity Extraction ](https://arxiv.org/abs/1704.03627) + +Image credit: [Robust Retrieval Augmented Generation for Zero-shot Slot Filling](https://arxiv.org/pdf/2108.13934.pdf)" +natural-language-processing,question generation,"The goal of **Question Generation** is to generate a valid and fluent question according to a given passage and the target answer. Question Generation can be used in many scenarios, such as automatic tutoring systems, improving the performance of Question Answering models and enabling chatbots to lead a conversation. + +Source: [Generating Highly Relevant Questions ](https://arxiv.org/abs/1910.03401)" +natural-language-processing,summarization,"Summarization is the task of producing a shorter version of one or several documents that preserves most of the +input's meaning." +natural-language-processing,causal emotion entailment,"The Causal Emotion Entailment is a simpler version of the span extraction task. In this task, given a +target utterance (U) with emotion E, the goal is to predict which particular utterances in the conversation +history H(U) are responsible for the +emotion E in the target utterance." +natural-language-processing,relationship extraction distant supervised,"Relationship extraction is the task of extracting semantic relationships from a text. Extracted relationships usually +occur between two or more entities of a certain type (e.g. Person, Organisation, Location) and fall into a number of +semantic categories (e.g. married to, employed by, lives in)." +natural-language-processing,aggression identification,"Develop a classifier that could make a 3-way classification in-between ‘Overtly Aggressive’, ‘Covertly Aggressive’ and ‘Non-aggressive’ text data. For this, TRAC-2 dataset of 5,000 aggression-annotated data from social media each in Bangla (in both Roman and Bangla script), Hindi (in both Roman and Devanagari script) and English for training and validation is to be used." +natural-language-processing,component classification,Classification of argumentative components inside a document +natural-language-processing,hope speech detection for tamil,Detecting Hope Speech in the Tamil language +natural-language-processing,unsupervised opinion summarization, +natural-language-processing,authorship verification,"Authorship verification (**AV**) is a research subject in the field of digital text forensics that concerns itself with the question, whether two documents have been written by the same person. + +Definition taken from the paper **Assessing the Applicability of Authorship Verification Methods**, +available at: " +natural-language-processing,implicatures, +natural-language-processing,long form question answering,Long-form question answering is a task requiring elaborate and in-depth answers to open-ended questions. +natural-language-processing,blackout poetry generation,"Blackout poetry is a form of poetry in which words in a passage are masked, except for a few which when combined together in order to convey some meaning." +natural-language-processing,news annotation,Assigning the appropriate labels to a news text based on a set of pre-defined labels. +natural-language-processing,personality recognition in conversation,"Given a speaker's conversation with others, it is required to recognize the speaker's personality traits through the conversation record, which includes two scenarios, (1) $1-1$ conversations: the robot recognizes the personality traits of the speaker through the conversation between them (e.g., psychological counseling), (2) $1-N$ conversations : the robot listens to the speaker's conversations with other $N$ people and then recognizes the speaker's personality traits (e.g., group chatbot, home service robot). Since $1-N$ includes the case of $1-1$, we only discusses PRC in $1-N$ conversations. The task of PRC in $1-N$ conversations can be formulated as: + +$Per_i = argmax_{Per'_i}P(Per'_i | C_{i,j}, \cdots, C_{i,N})$ + +where $Per_i=[Neu, Ext, Ope, Agr, Con]$ is a 5-dimensional vector representing Neuroticism, Extraversion, Openness, Agreeableness, and Conscientiousness. $C_{i,j}$ is the conversations between $Speaker_i$ and $Speaker_j$ ($1 \leq j \leq N$)." +natural-language-processing,scientific article summarization, +natural-language-processing,zero shot slot filling, +natural-language-processing,definition extraction, +natural-language-processing,abstract argumentation,Identifying argumentative statements from natural language dialogs. +natural-language-processing,unsupervised abstractive sentence compression,"Producing a shorter sentence by removing redundant information, preserving the grammatically and the important content of the original sentence without supervision. (Source: nlpprogress.com)" +natural-language-processing,relation extraction,"**Relation Extraction** is the task of predicting attributes and relations for entities in a sentence. For example, given a sentence “Barack Obama was born in Honolulu, Hawaii.”, a relation classifier aims at predicting the relation of “bornInCity”. Relation Extraction is the key component for building relation knowledge graphs, and it is of crucial significance to natural language processing applications such as structured search, sentiment analysis, question answering, and summarization. + + +Source: [Deep Residual Learning for Weakly-Supervised Relation Extraction ](https://arxiv.org/abs/1707.08866)" +natural-language-processing,keyphrase extraction,"A classic task to extract salient phrases that best summarize a document, which essentially has two stages: candidate generation and keyphrase ranking." +natural-language-processing,dependency grammar induction,"Also known as ""unsupervised dependency parsing""" +natural-language-processing,scientific concept extraction,Identification of scientific concepts in research articles. +natural-language-processing,open intent detection,"Open intent detection aims to identify n-class known intents, and detect one-class open intent." +natural-language-processing,textual analogy parsing,"Textual Analogy Parsing (TAP) is the task of identifying analogy frames from text. + +( Image credit: [Textual Analogy Parsing: What’s Shared and +What’s Compared among Analogous Facts](https://arxiv.org/pdf/1809.02700v1.pdf) )" +natural-language-processing,emotion recognition in conversation,"Given the transcript of a conversation along with speaker information of each constituent utterance, the ERC task aims to identify the emotion of each utterance from several pre-defined emotions. Formally, given the input sequence of N number of utterances [(u1, p1), (u2, p2), . . . , (uN , pN )], where each utterance ui = [ui,1, ui,2, . . . , ui,T ] consists of T words ui,j and spoken by party pi, the task is to predict the emotion label ei of each utterance ui. +." +natural-language-processing,multiview contextual commonsense inference,"Depending on the situation, multiple different reasonings are possible each leading to various unique inferences. For a given context, identifying all valid inferences require commonsense reasoning and as such, this task is called Multiview Contextual Commonsense Inference." +natural-language-processing,hate speech normalization, +natural-language-processing,sstod, +natural-language-processing,discourse marker prediction, +natural-language-processing,job classification, +natural-language-processing,subjectivity analysis,A related task to sentiment analysis is the subjectivity analysis with the goal of labeling an opinion as either subjective or objective. +natural-language-processing,race h, +natural-language-processing,probing language models, +natural-language-processing,extractive summarization, +natural-language-processing,question answering,"Question Answering is the task of answering questions (typically reading comprehension questions), but abstaining when presented with a question that cannot be answered based on the provided context. + +Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include [SQuAD](/dataset/squad), [HotPotQA](/dataset/hotpotqa), [bAbI](/dataset/babi-1), [TriviaQA](/dataset/triviaqa), [WikiQA](/dataset/wikiqa), and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet. + +( Image credit: [SQuAD](https://rajpurkar.github.io/mlx/qa-and-squad/) )" +natural-language-processing,abusive language, +natural-language-processing,automatic post editing,Automatic post-editing (APE) is used to correct errors in the translation made by the machine translation systems. +natural-language-processing,emotion cause extraction, +natural-language-processing,row annotation,"**Row Annotation** is the task of linking a row to a real-world entity and is a subtask of [Table Annotation](https://paperswithcode.com/task/table-annotation). It differs from [Cell Entity Annotation](https://paperswithcode.com/task/cell-entity-annotation) because CEA considers the linking of *cells* to entities while Row Annotation assumes that there is only an entity described in a row which usually is located in the main/entity column. + +A row is annotated using entities from different knowledge bases such as DBpedia or WikiData." +natural-language-processing,reading comprehension,"Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph or document and the answer often is a span in the document. + +Some specific tasks of reading comprehension include multi-modal machine reading comprehension and textual machine reading comprehension, among others. In the literature, machine reading comprehension can be divide into four categories: **cloze style**, **multiple choice**, **span prediction**, and **free-form answer**. Read more about each category [here](https://paperswithcode.com/paper/a-survey-on-machine-reading-comprehension-1). + +Benchmark datasets used for testing a model's reading comprehension abilities include [MovieQA](/dataset/movieqa), [ReCoRD](dataset/record), and [RACE](/dataset/race), among others. + +The Machine Reading group at UCL also provides an [overview of reading comprehension tasks](https://uclnlp.github.io/ai4exams/data.html). + +Figure source: [A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets](https://arxiv.org/pdf/2006.11880.pdf)" +natural-language-processing,sentiment analysis,"Sentiment analysis is the task of classifying the polarity of a given text. For instance, a text-based tweet can be categorized into either ""positive"", ""negative"", or ""neutral"". Given the text and accompanying labels, a model can be trained to predict the correct sentiment. + +Sentiment analysis techniques can be categorized into machine learning approaches, lexicon-based approaches, and even hybrid methods. Some subcategories of research in sentiment analysis include: multimodal sentiment analysis, aspect-based sentiment analysis, fine-grained opinion analysis, language specific sentiment analysis. + +More recently, deep learning techniques, such as RoBERTa and T5, are used to train high-performing sentiment classifiers that are evaluated using metrics like F1, recall, and precision. To evaluate sentiment analysis systems, benchmark datasets like SST, GLUE, and IMDB movie reviews are used. + +Further readings: + +- [Sentiment Analysis Based on Deep Learning: A Comparative Study](https://paperswithcode.com/paper/sentiment-analysis-based-on-deep-learning-a)" +natural-language-processing,zero shot relation triplet extraction,"Given an input sentence, the task is to extract triplets consisting of the head entity, relation label, and tail entity where the relation label is not seen at the training stage." +natural-language-processing,toxic spans detection,Given a sentence identify the toxic spans present in it. +natural-language-processing,self learning, +natural-language-processing,question selection, +natural-language-processing,cross lingual word embeddings, +natural-language-processing,semantic similarity,"The main objective **Semantic Similarity** is to measure the distance between the semantic meanings of a pair of words, phrases, sentences, or documents. For example, the word “car” is more similar to “bus” than it is to “cat”. The two main approaches to measuring Semantic Similarity are knowledge-based approaches and corpus-based, distributional methods. + + +Source: [Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection ](https://arxiv.org/abs/1801.03145)" +natural-language-processing,speculation detection,Identifying information in text that is speculative as opposed to factual information. +natural-language-processing,topic models,"A topic model is a type of statistical model for discovering the abstract ""topics"" that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for the discovery of hidden semantic structures in a text body." +natural-language-processing,text generation,"Text generation is the task of generating text with the goal of appearing indistinguishable to human-written text. This task if more formally known as ""natural language generation"" in the literature. + +Text generation can be addressed with Markov processes or deep generative models like LSTMs. Recently, some of the most advanced methods for text generation include [BART](/method/bart), [GPT](/method/gpt) and other [GAN-based approaches](/method/gan). Text generation systems are evaluated either through human ratings or automatic evaluation metrics like METEOR, ROUGE, and BLEU. + +Further readings: + +- [The survey: Text generation models in deep learning](https://www.sciencedirect.com/science/article/pii/S1319157820303360) +- [Modern Methods for Text Generation](https://arxiv.org/abs/2009.04968) + +( Image credit: [Adversarial Ranking for Language Generation](https://arxiv.org/abs/1705.11001) )" +natural-language-processing,multimodal sentiment analysis,"Multimodal sentiment analysis is the task of performing sentiment analysis with multiple data sources - e.g. a camera feed of someone's face and their recorded speech. + +( Image credit: [ICON: Interactive Conversational Memory Network +for Multimodal Emotion Detection](https://www.aclweb.org/anthology/D18-1280.pdf) )" +natural-language-processing,text classification,"Text classification is the task of assigning a sentence or document an appropriate category. The categories depend on the chosen dataset and can range from topics. + +Text classification classification problems include emotion classification, news classification, citation intent classification, among others. Benchmark datasets for evaluating text classification capabilities include GLUE, AGNews, among others. + +In recent years, deep learning techniques like XLNet and RoBERTa have attained some of the biggest performance jumps for text classification problems. + +( Image credit: [Text Classification Algorithms: A Survey](https://arxiv.org/pdf/1904.08067v4.pdf) )" +natural-language-processing,conversational response selection,Conversational response selection refers to the task of identifying the most relevant response to a given input sentence from a collection of sentences. +natural-language-processing,multi hop reading comprehension, +natural-language-processing,commonsense rl,Commonsense reasoning for Reinforcement Learning agents +natural-language-processing,automatic writing,Generating text based on internal machine representations. +natural-language-processing,timeline summarization,Identifying key dates of major events and providing short descriptions of what happened on these dates. (Source: https://www.aclweb.org/anthology/D19-5403/) +natural-language-processing,multi agent integration, +natural-language-processing,zero shot out of domain detection,Tasks for simultaneously learning few-shot In-Domain text classification and zero-shot Out-of-Domain detection on sentiment classification and intent classification. +natural-language-processing,semantic entity labeling,"- One of Form Understanding task (Word grouping, Semantic entity labeling, Entity linking) +- Classifying entities into one of four pre-defined categories: question, answer, header and, other. + +cited from + +G. Jaume, H. K. Ekenel, J. Thiran ""FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents,"" 2019" +natural-language-processing,phrase ranking,This task aims to evaluate the “global” rank list of phrases that a method finds from the input corpus. +natural-language-processing,knowledge graph embedding, +natural-language-processing,incongruity detection,Incongruity detection is the task of identifying statements in a text that are inconsistent with each other. +natural-language-processing,word similarity,Calculate a numerical score for the semantic similarity between two words. +natural-language-processing,poem meters classification, +natural-language-processing,table type detection,"Table Type Detection is the task of predicting the semantic type of a table and is a subtask of [Table Annotation](https://paperswithcode.com/task/table-annotation). The labels used for annotation in this task are types from vocabularies like DBpedia, Schema.org etc. like *Music* or *Hotel*. The semantic type of a table can indicate the content of a table. For example, if the semantic type of a table is *Music* it indicates that the table consists of music records/entities." +natural-language-processing,cross domain text classification,Learning an accurate model for the new unlabeled target domain given labeled data from multiple source domains where all domains have (possibly) different label sets. (Source: https://www.aclweb.org/anthology/P16-1155.pdf) +natural-language-processing,semantic role labeling,"Semantic role labeling aims to model the predicate-argument structure of a sentence +and is often described as answering ""Who did what to whom"". BIO notation is typically +used for semantic role labeling. + +Example: + +| Housing | starts | are | expected | to | quicken | a | bit | from | August’s | pace | +| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | +| B-ARG1 | I-ARG1 | O | O | O | V | B-ARG2 | I-ARG2 | B-ARG3 | I-ARG3 | I-ARG3 |" +natural-language-processing,code documentation generation,"Code Documentation Generation is a supervised task where a code function is the input to the model, and the model generates the documentation for this function. + +Description from: [CodeTrans: Towards Cracking the Language of Silicone's Code Through Self-Supervised Deep Learning and High Performance Computing](https://arxiv.org/pdf/2104.02443.pdf)" +natural-language-processing,coherence evaluation,Evaluating the overall coherence of text as measured by its readability and flow through ideas. +natural-language-processing,natural language understanding,"**Natural Language Understanding** is an important field of Natural Language Processing which contains various tasks such as text classification, natural language inference and story comprehension. Applications enabled by natural language understanding range from question answering to automated reasoning. + + + +Source: [Find a Reasonable Ending for Stories: Does Logic Relation Help the Story Cloze Test? ](https://arxiv.org/abs/1812.05411)" +natural-language-processing,cross lingual entity linking,"Cross-lingual entity linking is the task of using data and models available for one language for which ample such resources are available (e.g., English) to solve entity linking tasks (i.e., assigning a unique identity to entities in a text) in another, commonly low-resource, language. + +Image Source: [Towards Zero-resource Cross-lingual Entity Linking](https://www.aclweb.org/anthology/D19-6127.pdf)" +natural-language-processing,negation and speculation scope resolution, +natural-language-processing,french text diacritization,Addition of diacritics for undiacritized French Wikipedia texts. +natural-language-processing,de identification,"De-identification is the task of detecting privacy-related entities in text, such as person names, emails and contact data." +natural-language-processing,cg,"The named entity recognition (NER) involves identification of key information in the text and classification into a set of predefined categories. This includes standard entities in the text like Part of Speech (PoS) and entities like places, names etc..." +natural-language-processing,spam detection, +natural-language-processing,extract aspect,"Aspect extraction is the task of identifying and extracting terms relevant for opinion mining and sentiment analysis, for example terms for product attributes or features." +natural-language-processing,automated writing evaluation,"Automated writing evaluation refers to the task of analysing and measuring written text based on features, such as syntax, text complexity and vocabulary range." +natural-language-processing,unsupervised semantic parsing, +natural-language-processing,distractor generation,"Given a passage, a question, and an answer phrase, the goal of distractor generation (DG) is to generate context-related wrong options (i.e., distractor) for multiple-choice questions (MCQ)." +natural-language-processing,polyphone disambiguation,A part of the TTS-front end framework which serves to predict the correct pronunciation for the input polyphone characters. +natural-language-processing,cross lingual,"Cross-lingual natural language processing is the task of using data and models available for one language for which ample such resources are available (e.g., English) to solve tasks in another, commonly more low-resource, language." +natural-language-processing,question rewriting, +natural-language-processing,fg 1 pg 1,"The model learns one entity in the first step (no continuous learning is required), and learns one new entity in each subsequent continuous learning step." +natural-language-processing,hate intensity prediction, +natural-language-processing,email thread summarization,Image credit: [EmailSum: Abstractive Email Thread Summarization](https://paperswithcode.com/paper/emailsum-abstractive-email-thread) +natural-language-processing,conditional text generation,The task of generating text according to some pre-specified conditioning (e.g. topic or sentiment or constraint) +natural-language-processing,dialogue interpretation,Interpreting the meaning of a dialog. +natural-language-processing,multi word expression sememe prediction,Predict sememes for unannotated multi-word expressions. +natural-language-processing,multimodal abstractive text summarization,Abstractive text summarization by utilizing information from multiple modalities. +natural-language-processing,argument pair extraction ape,Argument pair extraction (APE) aims to extract interactive argument pairs from two passages of a discussion. +natural-language-processing,dialogue rewriting, +natural-language-processing,stereotypical bias analysis, +natural-language-processing,claim extraction with stance classification,"Since claims stand at a clear position towards a given topic, the sentences with clear stances should have a higher possibility to be the claims. Hence, identifying the stances of the claims is supposed to benefit the claim extraction task. By combining the claim extraction and stance classification subtasks, we define this integrated task as: given a specific topic and relevant articles, extract the claims from the articles and also identify the stance of the claims towards the topic." +natural-language-processing,coreference resolution,"Coreference resolution is the task of clustering mentions in text that refer to the same underlying real world entities. + +Example: + +``` + +-----------+ + | | +I voted for Obama because he was most aligned with my values"", she said. + | | | + +-------------------------------------------------+------------+ +``` + +""I"", ""my"", and ""she"" belong to the same cluster and ""Obama"" and ""he"" belong to the same cluster." +natural-language-processing,aspect sentiment opinion triplet extraction,"Aspect-Sentiment-Opinion Triplet Extraction (ASOTE) extracts aspect term, sentiment and opinion term triplets from sentences. In the triplet extracted by ASOTE the sentiment is the sentiment of the aspect term and opinion term pair." +natural-language-processing,cross lingual zero shot dependency parsing,"Cross-lingual zero-shot parsing is the task of inferring the dependency parse of sentences from one language without any labeled training trees for that language. + +Description from [NLP Progress](http://nlpprogress.com/english/dependency_parsing.html)" +natural-language-processing,negation scope resolution, +natural-language-processing,twitter sentiment analysis,Twitter sentiment analysis is the task of performing sentiment analysis on tweets from Twitter. +natural-language-processing,meme captioning,Automatic generation of natural language descriptions of the content of an input meme. +natural-language-processing,short text conversation,"Given a short text, finding an appropriate response (Source: http://staff.ustc.edu.cn/~cheneh/paper_pdf/2013/HaoWang.pdf)" +natural-language-processing,temporal relation classification,"Temporal Relation Classification is the task that is concerned with classifying the temporal relation between a pair of temporal entities (traditional events and temporal expressions). Initial approaches aimed to classify the temporal relation in thirteen relation types that were depicted by James Allen in his seminal work ""Maintaining Knowledge about Temporal Intervals"". However, due to the ambiguity in the annotation, recent corpora have been limiting the type of relations to a subset of those relations. + +Notice that although Temporal Relation Classification can be thought of as a subtask of Temporal Relation Extraction, the two tasks can be morphed if one adds a label that indicates the absence of a temporal relation between the entities (e.g. ""no_relation"" or ""vague"") to Temporal Relation Classification." +natural-language-processing,memex question answering,"Question answering with real-world multi-modal personal collections, e.g., photo albums with visual, text, time and location information." +natural-language-processing,clickbait detection,"Clickbait detection is the task of identifying clickbait, a form of false advertisement, that uses hyperlink text or a thumbnail link that is designed to attract attention and to entice users to follow that link and read, view, or listen to the linked piece of online content, with a defining characteristic of being deceptive, typically sensationalized or misleading (Source: Adapted from Wikipedia)" +natural-language-processing,question similarity,"This is the problem of detecting duplicate questions in forums, which is an important step towards automating the process of answering new questions" +natural-language-processing,punctuation restoration,Punctuation Restoration +natural-language-processing,constituency grammar induction,Inducing a constituency-based phrase structure grammar. +natural-language-processing,embeddings evaluation, +natural-language-processing,extract aspect polarity tuple, +natural-language-processing,short text clustering, +natural-language-processing,siqa, +natural-language-processing,emotional dialogue acts,"Associating Emotions and Dialogue Acts to find unique relationships between them such as Accept/Agree dialogue acts often occur with the Joy emotion, Apology with Sadness, or Thanking with Joy. +First introduced in the paper EDA: Enriching Emotional Dialogue Acts using an Ensemble of Neural Annotators, LREC 2020 (https://aclanthology.org/2020.lrec-1.78/)." +natural-language-processing,romanian text diacritization,Addition of diacritics for undiacritized Romanian Wikipedia texts. +natural-language-processing,action parsing,"Action parsing is the task of, given a video or still image, assigning each frame or image a label describing the action in that frame or image." +natural-language-processing,role filler entity extraction,Role-filler entity extraction task on the MUC-4 dataset. +natural-language-processing,persian sentiment anlysis,Persian Sentiment analysis is the task of classifying the polarity of a given text. +natural-language-processing,czech text diacritization,Addition of diacritics for undiacritized Czech Wikipedia texts. +natural-language-processing,phrase vector embedding,"Just like the generation of word (1-gram) vector embedding, this task is for phrase (n-gram) vector embedding." +natural-language-processing,slovak text diacritization,Addition of diacritics for undiacritized Slovak Wikipedia texts. +natural-language-processing,data to text generation,"A classic problem in natural-language generation (NLG) involves taking structured data, such as a table, as input, and producing text that adequately and fluently describes this data as output. Unlike machine translation, which aims for complete transduction of the sentence to be translated, this form of NLG is usually taken to require addressing (at least) two separate challenges: what to say, the selection of an appropriate subset of the input data to discuss, and how to say it, the surface realization of a generation. + +( Image credit: [Data-to-Text Generation with Content Selection and Planning](https://arxiv.org/pdf/1809.00582v2.pdf) )" +natural-language-processing,joint ner and classification,Joint named entity recognition and classification refers to the combined task of identifying named entitites in a given text and text classification. +natural-language-processing,entity resolution,"**Entity resolution** (also known as entity matching, record linkage, or duplicate detection) is the task of finding records that refer to the same real-world entity across different data sources (e.g., data files, books, websites, and databases). (Source: Wikipedia) + +Surveys on entity resolution: + +- [Vassilis et al.: End-to-End Entity Resolution for Big Data: A Survey](https://arxiv.org/pdf/1905.06397.pdf), 2020. + +- [Barlaug and Gulla: Neural Networks for Entity Matching: A Survey](https://arxiv.org/pdf/2010.11075.pdf), 2021. + +The task of entity resolution is closely related to the task of [entity alignment](https://paperswithcode.com/task/entity-alignment) which focuses on matching entities between knowledge bases. The task of [entity linking](https://paperswithcode.com/task/entity-linking) differs from entity resolution as entity linking focuses on identifying entity mentions in free text." +natural-language-processing,abstract anaphora resolution,"Abstract Anaphora Resolution aims to resolve nominal expressions (e.g., this result, those two actions) and pronominal expressions (e.g. this, that, it) that refer to abstract-object-antecedents such as facts, events, plans, actions, or situations." +natural-language-processing,document ai, +natural-language-processing,end to end dialogue modelling, +natural-language-processing,natural language inference,"**Natural language inference (NLI)** is the task of determining whether a ""hypothesis"" is +true (entailment), false (contradiction), or undetermined (neutral) given a ""premise"". + +Example: + +| Premise | Label | Hypothesis | +| --- | ---| --- | +| A man inspects the uniform of a figure in some East Asian country. | contradiction | The man is sleeping. | +| An older and younger man smiling. | neutral | Two men are smiling and laughing at the cats playing on the floor. | +| A soccer game with multiple males playing. | entailment | Some men are playing a sport. | + +Approaches used for NLI include earlier symbolic and statistical approaches to more recent deep learning approaches. Benchmark datasets used for NLI include [SNLI](/dataset/snli), [MultiNLI](/dataset/multinli), [SciTail](/dataset/scitail), among others. You can get hands-on practice on the SNLI task by following this [d2l.ai chapter](https://d2l.ai/chapter_natural-language-processing-applications/natural-language-inference-and-dataset.html). + +Further readings: + +- [Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches](https://arxiv.org/abs/1904.01172)" +natural-language-processing,text categorization,"**Text Categorization** is the task of automatically assigning pre-defined categories to documents written in natural languages. Several types of Text Categorization have been studied, each of which deals with different types of documents and categories, such as topic categorization to detect discussed topics (e.g., sports, politics), spam detection, and sentiment classification to determine the sentiment typically in product or movie reviews. + + +Source: [Effective Use of Word Order for Text Categorization with Convolutional Neural Networks ](https://arxiv.org/abs/1412.1058)" +natural-language-processing,cross document coreference resolution, +natural-language-processing,simultaneous speech to text translation,"Simultaneous Speech-to-Text translation aims to translate concurrently with the source speech. It is crucial since it enables real-time interpretation of conversations, lectures and talks." +natural-language-processing,image captioning,"**Image Captioning** is the task of describing the content of an image in words. This task lies at the intersection of computer vision and natural language processing. Most image captioning systems use an encoder-decoder framework, where an input image is encoded into an intermediate representation of the information in the image, and then decoded into a descriptive text sequence. The most popular benchmarks are nocaps and COCO, and models are typically evaluated according to a BLEU or CIDER metric. + +( Image credit: [Reflective Decoding Network for Image Captioning, ICCV'19](https://openaccess.thecvf.com/content_ICCV_2019/papers/Ke_Reflective_Decoding_Network_for_Image_Captioning_ICCV_2019_paper.pdf) )" +natural-language-processing,multi label text classification,"According to Wikipedia ""In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple labels may be assigned to each instance. Multi-label classification is a generalization of multiclass classification, which is the single-label problem of categorizing instances into precisely one of more than two classes; in the multi-label problem there is no constraint on how many of the classes the instance can be assigned to.""" +natural-language-processing,aspect category polarity, +natural-language-processing,lexical simplification,"The goal of **Lexical Simplification** is to replace complex words (typically words that are used less often in language and are therefore less familiar to readers) with their simpler synonyms, without infringing the grammaticality and changing the meaning of the text. + + +Source: [Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization ](https://arxiv.org/abs/1809.04163)" +natural-language-processing,event driven trading,Making stock trading decisions based on events. +natural-language-processing,citation intent classification,Identifying the reason why an author cited another author. +natural-language-processing,multilingual nlp, +natural-language-processing,hyper relational extraction,"HyperRED is a dataset for the new task of hyper-relational extraction, which extracts relation triplets together with qualifier information such as time, quantity or location. For example, the relation triplet (Leonard Parker, Educated At, Harvard University) can be factually enriched by including the qualifier (End Time, 1967)." +natural-language-processing,claim evidence pair extraction cepe,"Since evidence is clearly supporting the corresponding claims in an article, claims and evidence are mutually reinforcing each other in the context. Therefore, we hypothesize the claim extraction task and the evidence extraction task may benefit each other. By combining these two subtasks, we define the second integrated task as: given a specific topic and relevant articles, extract the claim-evidence pairs (CEPs) from the articles." +natural-language-processing,variable detection,Identifying whether a sentence contains a variable mention. +natural-language-processing,clinical assertion status detection,"Classifying the assertions made on given medical concepts as being present, absent, or possible in the patient, conditionally present in the patient under certain circumstances, hypothetically present in the patient at some future point, and mentioned in the patient report but associated with someoneelse. (e.g. clinical finding pertains to the patient by assigning a label such as present (”patient is diabetic”), absent (”patient denies nausea”), conditional (”dyspnea while climbing stairs”), or associated with someone else (”family history of depression”)) + +( [Source](https://arxiv.org/pdf/2012.04005v1.pdf) )" +natural-language-processing,association, +natural-language-processing,trajectory prediction,"**Trajectory Prediction** is the problem of predicting the short-term (1-3 seconds) and long-term (3-5 seconds) spatial coordinates of various road-agents such as cars, buses, pedestrians, rickshaws, and animals, etc. These road-agents have different dynamic behaviors that may correspond to aggressive or conservative driving styles. + + +Source: [Forecasting Trajectory and Behavior of Road-Agents Using Spectral Clustering in Graph-LSTMs ](https://arxiv.org/abs/1912.01118)" +natural-language-processing,aspect category detection,Aspect category detection (ACD) in sentiment analysis aims to identify the aspect categories mentioned in a sentence. +natural-language-processing,rules of thumb generation,Generate relevant rules-of-thumb given text input. +natural-language-processing,multilingual named entity recognition, +natural-language-processing,anaphora resolution,Resolving what expression a pronoun or a noun phrase refers to. +natural-language-processing,text style transfoer,"Text Style Transfer is the task of controlling certain attributes of generated text. The state-of-the-art methods can be categorized into two main types which are used on parallel and non-parallel data. Methods on parallel data are typically supervised methods that use a neural sequence-to-sequence model with the encoder-decoder architecture. Methods on non-parallel data are usually unsupervised approaches using Disentanglement, Prototype Editing and Pseudo-Parallel Corpus Construction. + +The popular benchmark for this task is the Yelp Review Dataset. Models are typically evaluated with the metrics of Sentiment Accuracy, BLEU, and PPL." +natural-language-processing,dialogue act classification,"Dialogue act classification is the task of classifying an utterance with respect to the function it serves in a dialogue, i.e. the act the speaker is performing. Dialogue acts are a type of speech acts (for Speech Act Theory, see [Austin (1975)](http://www.hup.harvard.edu/catalog.php?isbn=9780674411524) and [Searle (1969)](https://www.cambridge.org/core/books/speech-acts/D2D7B03E472C8A390ED60B86E08640E7))." +natural-language-processing,unsupervised extractive summarization, +natural-language-processing,figure of speech detection, +natural-language-processing,dialogue management,"( Image credit: [Bocklisch et al.](https://arxiv.org/pdf/1712.05181v2.pdf) )" +natural-language-processing,record linking,"The task of finding records in a data set that refer to the same entity across different data sources. + +Record linking is also called *entity resolution* or *entity matching*. Further material about this task is collected at [entity resolution](https://paperswithcode.com/task/entity-resolution)." +natural-language-processing,disambiguation q, +natural-language-processing,cross document language modeling,"Involves pretraining language models to support multi-document NLP tasks. + +Source: [Cross-Document Language Modeling](https://arxiv.org/pdf/2101.00406v1.pdf) + +Image Credit: [Cross-Document Language Modeling](https://arxiv.org/pdf/2101.00406v1.pdf)" +natural-language-processing,text based stock prediction,"Make stock predictions based on text (e.g., news articles, twitters, etc.)." +natural-language-processing,sketch to text generation,"Generate a full text based on a sketch (key information consisting of textual spans, phrases, or words), where the sketch may only make up a very small part of the full text." +natural-language-processing,variable disambiguation,Identifying which variable is mentioned in a text. +natural-language-processing,semantic parsing,"**Semantic Parsing** is the task of transducing natural language utterances into formal meaning representations. The target meaning representations can be defined according to a wide variety of formalisms. This include linguistically-motivated semantic representations that are designed to capture the meaning of any sentence such as λ-calculus or the abstract meaning representations. Alternatively, for more task-driven approaches to Semantic Parsing, it is common for meaning representations to represent executable programs such as SQL queries, robotic commands, smart phone instructions, and even general-purpose programming languages like Python and Java. + + +Source: [Tranx: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation ](https://arxiv.org/abs/1810.02720)" +natural-language-processing,text simplification,"**Text Simplification** is the task of reducing the complexity of the vocabulary and sentence structure of text while retaining its original meaning, with the goal of improving readability and understanding. Simplification has a variety of important societal applications, for example increasing accessibility for those with cognitive disabilities such as aphasia, dyslexia, and autism, or for non-native speakers and children with reading difficulties. + + +Source: [Multilingual Unsupervised Sentence Simplification](https://arxiv.org/abs/2005.00352)" +natural-language-processing,phrase tagging,A fine-grained task that aims to find all occurrences of phrases in sentences. +natural-language-processing,columns property annotation,"**Column Property Annotation** (CPA) refers to the task of predicting the semantic relation between two columns and is a subtask of [Table Annotation](https://paperswithcode.com/task/table-annotation). The input of a CPA problem is most commonly a pair of columns, but can also be only one column. The labels used in CPA are properties from vocabularies. Some examples are *name*, *price*, *datePublished* etc. + +CPA is usually a multi-class classification problem and is also referred to as column relation annotation or relation extraction in different works." +natural-language-processing,turning point identification,"Identification of key events in a narrative (such as movie or TV episode). The task is supported by screenwriting theory, according to which there are 5 different types of key events in a movie. These key events (e.g., change of plans, major setback, climax) are crucial narrative moments: they define the plot structure and determine its progression and thematic units (e.g., setup, complications, aftermath)." +natural-language-processing,reliable intelligence identification, +natural-language-processing,paper generation,"Generating scientific paper texts, such as abstracts." +natural-language-processing,multi grained named entity recognition,"Multi-Grained Named Entity Recognition aims to detect and recognize entities on multiple granularities, without explicitly assuming non-overlapping or totally nested structures." +natural-language-processing,emotional intelligence, +natural-language-processing,knowledge base question answering,"Knowledge Base Q&A is the task of answering questions from a knowledge base. + +( Image credit: [Modeling Semantics with Gated Graph Neural Networks for Knowledge Base Question Answering](https://www.aclweb.org/anthology/C18-1280.pdf) )" +natural-language-processing,cross lingual question answering, +natural-language-processing,passage retrieval,**Passage retrieval** is a specialized type of IR application that retrieves relevant passages (or pieces of text) rather than an entire ranked set of documents. +natural-language-processing,memorization, +natural-language-processing,emotion classification,"Emotion classification, or emotion categorization, is the task of recognising emotions to classify them into the corresponding category. Given an input, classify it as 'neutral or no emotion' or as one, or more, of several given emotions that best represent the mental state of the subject's facial expression, words, and so on. Some example benchmarks include ROCStories, Many Faces of Anger (MFA), and GoEmotions. Models can be evaluated using metrics such as the Concordance Correlation Coefficient (CCC) and the Mean Squared Error (MSE)." +natural-language-processing,unsupervised text classification, +natural-language-processing,winogrande, +natural-language-processing,aspect oriented opinion extraction,Extracting the paired opinion terms for every given aspect term in a sentence. +natural-language-processing,hyperbaton, +natural-language-processing,cross domain named entity recognition, +natural-language-processing,text compression, +natural-language-processing,toxic comment classification, +natural-language-processing,grammatical error detection,"Grammatical Error Detection (GED) is the task of detecting different kinds of errors in text such as spelling, punctuation, grammatical, and word choice errors. Grammatical error detection (GED) is one of the key component in grammatical error correction (GEC) community." +natural-language-processing,knowledge base population,Knowledge base population is the task of filling the incomplete elements of a given knowledge base by automatically processing a large corpus of text. +natural-language-processing,explanation generation, +natural-language-processing,temporal information extraction,"Temporal information extraction is the identification of chunks/tokens corresponding to temporal intervals, and the extraction and determination of the temporal relations between those. The entities extracted may be temporal expressions (timexes), eventualities (events), or auxiliary signals that support the interpretation of an entity or relation. Relations may be temporal links (tlinks), describing the order of events and times, or subordinate links (slinks) describing modality and other subordinative activity, or aspectual links (alinks) around the various influences aspectuality has on event structure. + +The markup scheme used for temporal information extraction is well-described in the ISO-TimeML standard, and also on [www.timeml.org](http://www.timeml.org). + +``` + + + + + + + PRI20001020.2000.0127 + NEWS STORY + 10/20/2000 20:02:07.85 + + + The Navy has changed its account of the attack on the USS Cole in Yemen. + Officials now say the ship was hit nearly two hours after it had docked. + Initially the Navy said the explosion occurred while several boats were helping + the ship to tie up. The change raises new questions about how the attackers + were able to get past the Navy security. + + + 10/20/2000 20:02:28.05 + + + + + + +``` + +To avoid leaking knowledge about temporal structure, train, dev and test splits must be made at document level for temporal information extraction." +natural-language-processing,low resource neural machine translation,Low-resource machine translation is the task of machine translation on a low-resource language where large data may not be available. +natural-language-processing,stance detection us election 2020 biden, +natural-language-processing,aspect sentiment triplet extraction,"Aspect Sentiment Triplet Extraction (ASTE) +is the task of extracting the triplets of target +entities, their associated sentiment, and opinion spans explaining the reason for the sentiment." +natural-language-processing,chinese named entity recognition,"Chinese named entity recognition is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. from Chinese text (Source: Adapted from Wikipedia)." +natural-language-processing,bilingual lexicon induction,Translate words from one language to another. +natural-language-processing,hate span identification, +natural-language-processing,discourse segmentation, +natural-language-processing,race m, +natural-language-processing,author attribution,Authorship attribution is the task of determining the author of a text. +natural-language-processing,personalized and emotional conversation,"Personalized and Emotional Conversation (**PEC**) is defined as follows: Given the personalized information ($P_{R1}$ and $P_{R2}$) of two speakers, their conversation context $C$, the emotion $E_K$ and DA $D_K$ of the response to be generated, and the personalized information $P_{K}$ of the responder, the goal is to generate an anthropomorphic response $Y$. +\begin{equation} +Y = argmax_{Y'}P(Y'|C, E_K, D_K, P_K) \label{task_definition} +\end{equation} + +Particularly, context $C=\{(U_1,E_1,D_1,P_1),\cdots,(U_{K-1},E_{K-1},D_{K-1},P_{K-1})\}$ contains multi-turn conversation content (i.e., utterance $U_i$), emotion $E_i$ of the associated utterance, DA $D_i$ of the associated utterance, and personalized information $P_i$ of the associated speaker." +natural-language-processing,sentence pair modeling,Comparing two sentences and their relationship based on their internal representation. +natural-language-processing,attribute value extraction, +natural-language-processing,latvian text diacritization,Addition of diacritics for undiacritized Latvian Wikipedia texts. +natural-language-processing,abuse detection,"Abuse detection is the task of identifying abusive behaviors, such as hate speech, offensive language, sexism and racism, in utterances from social media platforms (Source: https://arxiv.org/abs/1802.00385)." +natural-language-processing,hierarchical text classification of blurbs,"Shared Task on Hierarchical Classification of Blurbs (GermEval 2019 / KONVENS) + +https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/germeval-2019-hmc.html" +natural-language-processing,job prediction, +natural-language-processing,generative question answering, +natural-language-processing,negation detection,Negation detection is the task of identifying negation cues in text. +natural-language-processing,csc, +natural-language-processing,croatian text diacritization,Addition of diacritics for undiacritized Croatian Wikipedia texts. +natural-language-processing,math information retrieval,Information Retrieval on Math Contents +natural-language-processing,emotion recognition in context, +natural-language-processing,semantic retrieval, +natural-language-processing,key point matching,"Given a debatable topic, a set of key points per stance, and a set of crowd arguments supporting or contesting the topic, report for each argument its match score for each of the key points under the same stance towards the topic." +natural-language-processing,interactive evaluation of dialog,Task that involve building/adapting conversation models to work effectively in an interactive setting. +natural-language-processing,semeval 2022 task 4 1 binary pcl detection, +natural-language-processing,entity typing on dh kgs, +natural-language-processing,english proverbs, +natural-language-processing,vietnamese datasets, +natural-language-processing,information extraction,Information extraction is the task of automatically extracting structured information from unstructured and / or semi-structured machine-readable documents and other electronically represented sources (Source: Wikipedia). +natural-language-processing,open domain question answering,Open-domain question answering is the task of question answering on open-domain datasets such as Wikipedia. +natural-language-processing,term extraction,"Term Extraction, or Automated Term Extraction (ATE), is about extraction domain-specific terms from natural language text. +For example, the sentence “We meta-analyzed mortality using random-effect models” contains the domain-specific single-word terms ""meta-analyzed"", ""mortality"" and the multi-word term ""random-effect models""." +natural-language-processing,web page tagging,Assigning appropriate tags to a web page. +natural-language-processing,complex word identification,Identifying difficult words or expressions in a text. +natural-language-processing,cross lingual ner, +natural-language-processing,unsupervised machine translation,"Unsupervised machine translation is the task of doing machine translation without any translation resources at training time. + +( Image credit: [Phrase-Based & Neural Unsupervised Machine Translation](https://arxiv.org/pdf/1804.07755v2.pdf) )" +natural-language-processing,taxonomy expansion,Expand a seed taxonomy with new unseen node +natural-language-processing,semi supervised text classification 1, +natural-language-processing,query based extractive summarization,Extracting summarized information that answers a given query based on a reference text. +natural-language-processing,multimodal deep learning, +natural-language-processing,transition based dependency parsing, +natural-language-processing,hope speech detection for english,Detecting Hope Speech in the English Language +natural-language-processing,conversational question answering, +natural-language-processing,stance detection us election 2020 trump, +natural-language-processing,negation and speculation cue detection, +natural-language-processing,multi document summarization,"**Multi-Document Summarization** is a process of representing a set of documents with a short piece of text by capturing the relevant information and filtering out the redundant information. Two prominent approaches to Multi-Document Summarization are extractive and abstractive summarization. Extractive summarization systems aim to extract salient snippets, sentences or passages from documents, while abstractive summarization systems aim to concisely paraphrase the content of the documents. + + +Source: [Multi-Document Summarization using Distributed Bag-of-Words Model ](https://arxiv.org/abs/1710.02745)" +natural-language-processing,counterspeech detection,"Counter-speech detection is the task of detecting counter-speech, i.e., a crowd-sourced response that argues, disagrees, or presents an opposing view to extremism or hateful content on social media platforms (Source: Adapted from: https://icsr.info/wp-content/uploads/2018/03/ICSR-Report-Challenging-Hate-Counter-speech-Practices-in-Europe.pdf)" +natural-language-processing,sentence ordering,Sentence ordering task deals with finding the correct order of sentences given a randomly ordered paragraph. +natural-language-processing,extreme summarization,Image credit: [TLDR: Extreme Summarization of Scientific Documents](https://arxiv.org/pdf/2004.15011v3.pdf) +natural-language-processing,cross language text summarization,"Using data and models available for one language for which ample such resources are available (e.g., English) to solve summarization tasks in another, commonly more low-resource, language." +natural-language-processing,event extraction,"Determine the extent of the events in a text. + +Other names: Event Tagging; Event Identification" +natural-language-processing,relational reasoning,"The goal of **Relational Reasoning** is to figure out the relationships among different entities, such as image pixels, words or sentences, human skeletons or interactive moving agents. + + +Source: [Social-WaGDAT: Interaction-aware Trajectory Prediction via Wasserstein Graph Double-Attention Network ](https://arxiv.org/abs/2002.06241)" +natural-language-processing,question quality assessment,This task expects to build subjective question-answering algorithms to check whether a question is of high quality or needs to be edited/flagged. +natural-language-processing,overlapping mention recognition,Overlapping mention recognition is the task of correctly identifying all mentions of an entity in the presence of overlapping entity mentions. +natural-language-processing,relation mention extraction,Extracting phrases representative for a specific relation. +natural-language-processing,rumour detection,"Rumor detection is the task of identifying rumors, i.e. statements whose veracity is not quickly or ever confirmed, in utterances on social media platforms." +natural-language-processing,traditional spam detection, +natural-language-processing,clinical information retreival, +natural-language-processing,dialogue state tracking,"Dialogue state tacking consists of determining at each turn of a dialogue the +full representation of what the user wants at that point in the dialogue, +which contains a goal constraint, a set of requested slots, and the user's dialogue act." +natural-language-processing,story completion,"Given a story prefix and two possible endings, determining which one is the correct (coherent) ending of the story." +natural-language-processing,sentence classification, +natural-language-processing,joint multilingual sentence representations, +natural-language-processing,drs parsing,"Discourse Representation Structures (DRS) are formal meaning representations introduced by Discourse Representation Theory. DRS parsing is a complex task, comprising other NLP tasks, such as semantic role labeling, word sense disambiguation, co-reference resolution and named entity tagging. Also, DRSs show explicit scope for certain operators, which allows for a more principled and linguistically motivated treatment of negation, modals and quantification, as has been advocated in formal semantics. Moreover, DRSs can be translated to formal logic, which allows for automatic forms of inference by third parties. + +Description from [NLP Progress](http://nlpprogress.com/english/semantic_parsing.html)" +natural-language-processing,zero shot machine translation,Translate text or speech from one language to another without supervision. +natural-language-processing,occupation prediction, +natural-language-processing,text annotation, +natural-language-processing,extracting covid 19 events from twitter, +natural-language-processing,spelling correction,Spelling correction is the task of detecting and correcting spelling mistakes. +natural-language-processing,event causality identification, +natural-language-processing,response generation,A task where an agent should play the $DE$ role and generate a text to respond to a $P$ message. +natural-language-processing,dialogue generation,"Dialogue generation is the task of ""understanding"" natural language inputs - within natural language processing in order to produce output. The systems are usually intended for conversing with humans, for instance back and forth dialogue with a conversation agent like a chatbot. Some example benchmarks for this task (see others such as Natural Language Understanding) include FusedChat and Ubuntu DIalogue Corpus (UDC). Models can be evaluated via metrics such as BLEU, ROUGE, and METEOR albeit with challenges in terms of weak correlation with human judgement, that may be addressed by new ones like UnSupervised and Reference-free (USR) and Metric for automatic Unreferenced dialog evaluation (MaUde)." +natural-language-processing,fact based text editing,"Fact-based Text Editing aims to revise a given document to better describe the facts in a knowledge base (e.g., several triples)." +natural-language-processing,dependency parsing,"Dependency parsing is the task of extracting a dependency parse of a sentence that represents its grammatical +structure and defines the relationships between ""head"" words and words, which modify those heads. + +Example: + +``` + root + | + | +-------dobj---------+ + | | | +nsubj | | +------det-----+ | +-----nmod------+ ++--+ | | | | | | | +| | | | | +-nmod-+| | | +-case-+ | ++ | + | + + || + | + | | +I prefer the morning flight through Denver +``` + +Relations among the words are illustrated above the sentence with directed, labeled +arcs from heads to dependents (+ indicates the dependent)." +natural-language-processing,part of speech tagging,"Part-of-speech tagging (POS tagging) is the task of tagging a word in a text with its part of speech. +A part of speech is a category of words with similar grammatical properties. Common English +parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, etc. + +Example: + +| Vinken | , | 61 | years | old | +| --- | ---| --- | --- | --- | +| NNP | , | CD | NNS | JJ |" +natural-language-processing,vietnamese parsing, +natural-language-processing,multi word expression embedding,Learn embeddings for multi-word expressions +natural-language-processing,specificity, +natural-language-processing,sarcasm detection,"The goal of **Sarcasm Detection** is to determine whether a sentence is sarcastic or non-sarcastic. Sarcasm is a type of phenomenon with specific perlocutionary effects on the hearer, such as to break their pattern of expectation. Consequently, correct understanding of sarcasm often requires a deep understanding of multiple sources of information, including the utterance, the conversational context, and, frequently some real world facts. + + +Source: [Attentional Multi-Reading Sarcasm Detection ](https://arxiv.org/abs/1809.03051)" +natural-language-processing,implicit relations, +natural-language-processing,question to declarative sentence,"Question Answer to Declarative Sentence (QA2D) is the task of generating declarative statements from question, answer pairs. + +See: +Demszky, D., Guu, K., & Liang, P. (2018). Transforming Question Answering Datasets Into Natural Language Inference Datasets. arXiv preprint. arXiv:1809.02922" +natural-language-processing,joint entity and relation extraction on, +natural-language-processing,nonsense words grammar, +natural-language-processing,dialog relation extraction,Dialog Relation Extraction is the task of predicting the relation type between entities mentioned in dialogue. It uses multiple tokens to capture possible relations between pairs of entities in the dialogue. The popular benchmark for this task is the DialogRE dataset. The models are typically evaluated with the metric of F1 Score for both standard-setting and conversational settings. +natural-language-processing,semantic role labeling predicted predicates,PropBank semantic role labeling with predicted predicates. +natural-language-processing,conversational response generation,"Given an input conversation, generate a natural-looking text reply to the last conversation element. + +Image credit: [DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation](https://www.aclweb.org/anthology/2020.acl-demos.30.pdf)" +natural-language-processing,goal oriented dialogue systems,Achieving a pre-defined goal through a dialog. +natural-language-processing,low resource named entity recognition,"Low resource named entity recognition is the task of using data and models available for one language for which ample such resources are available (e.g., English) to solve named entity recognition tasks in another, commonly more low-resource, language." +natural-language-processing,native language identification,Native Language Identification (NLI) is the task of determining an author's native language (L1) based only on their writings in a second language (L2). +natural-language-processing,query wellformedness,"Assessing whether a query is grammatically correct, contains no spelling mistakes, and asks an explicit question. + +Image Source: [Identifying Well-formed Natural Language Questions](https://arxiv.org/pdf/1808.09419.pdf)" +natural-language-processing,morphological disambiguation, +natural-language-processing,information retrieval,"Information retrieval is the task of ranking a list of documents or search results in response to a query + +( Image credit: [sudhanshumittal](https://github.com/sudhanshumittal/Information-retrieval-system) )" +natural-language-processing,nested mention recognition,Nested mention recognition is the task of correctly modeling the nested structure of mentions. +natural-language-processing,news classification, +natural-language-processing,visual dialogue,"Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question." +natural-language-processing,deep clustering, +natural-language-processing,empathetic response generation,Generate empathetic responses in dialogues +natural-language-processing,few shot ner,"Few-Shot Named Entity Recognition (NER) is the task of recognising a 'named entity' like a person, organization, time and so on in a piece of text e.g. ""Alan Mathison [person] visited the Turing Institute [organization] in June [time]." +natural-language-processing,automated essay scoring,"Essay scoring: **Automated Essay Scoring** is the task of assigning a score to an essay, usually in the context of assessing the language ability of a language learner. The quality of an essay is affected by the following four primary dimensions: topic relevance, organization and coherence, word usage and sentence complexity, and grammar and mechanics. + + +Source: [A Joint Model for Multimodal Document Quality Assessment ](https://arxiv.org/abs/1901.01010)" +natural-language-processing,discourse parsing, +natural-language-processing,arabic sentiment analysis,"Arabic sentiment analysis is the process of computationally identifying and categorizing opinions expressed in a piece of arabic text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral (Source: Oxford Languages)" +natural-language-processing,suggestion mining,"Suggestion mining can be defined as the extraction of suggestions from unstructured text," +natural-language-processing,nested named entity recognition,"Nested named entity recognition is a subtask of information extraction that seeks to locate and classify nested named entities (i.e., hierarchically structured entities) mentioned in unstructured text (Source: Adapted from Wikipedia)." +natural-language-processing,meeting summarization,Generating a summary from meeting transcriptions. +natural-language-processing,entity extraction, +natural-language-processing,question answer categorization, +natural-language-processing,document level event extraction, +natural-language-processing,aspect extraction,"Aspect extraction is the task of identifying and extracting terms relevant for opinion mining and sentiment analysis, for example terms for product attributes or features." +natural-language-processing,chemical indexing,Predict which chemicals should be indexed. +natural-language-processing,goal oriented dialog,Achieving a pre-defined goal through a dialog. +natural-language-processing,zero shot text to image generation,Image credit: [GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models](https://paperswithcode.com/paper/glide-towards-photorealistic-image-generation) +natural-language-processing,visual storytelling,"( Image credit: [No Metrics Are Perfect](https://github.com/eric-xw/AREL) )" +natural-language-processing,spoken language understanding, +natural-language-processing,unsupervised dependency parsing,"Unsupervised dependency parsing is the task of inferring the dependency parse of sentences without any labeled training data. + +Description from [NLP Progress](http://nlpprogress.com/english/dependency_parsing.html)" +natural-language-processing,contextualised word representations, +natural-language-processing,thai word tokenization,Thai word segmentation +natural-language-processing,syntax representation, +natural-language-processing,text attribute transfer,"The goal of the **Text Attribute Transfer** task is to change an input text such that the value of a particular linguistic attribute of interest (e.g. language = English, sentiment = Positive) is transferred to a different desired value (e.g. language = French, sentiment = Negative). This task needs approaches that can disentangle the content from other linguistic attributes of the text. + + +Source: [Improved Neural Text Attribute Transfer with Non-parallel Data ](https://arxiv.org/abs/1711.09395)" +natural-language-processing,phrase grounding,"Given an image and a corresponding caption, the **Phrase Grounding** task aims to ground each entity mentioned by a noun phrase in the caption to a region in the image. + + +Source: [Phrase Grounding by Soft-Label Chain Conditional Random Field ](https://arxiv.org/abs/1909.00301)" +natural-language-processing,entity linking,"Assigning a unique identity to entities (such as famous individuals, locations, or companies) mentioned in text (Source: Wikipedia)." +natural-language-processing,dialogue,Dialogue is notoriously hard to evaluate. Past approaches have used human evaluation. +natural-language-processing,method name prediction, +natural-language-processing,target oriented opinion words extraction,The objective of TOWE is to extract the corresponding opinion words describing or evaluating the target from the review. +natural-language-processing,deep attention, +natural-language-processing,table to text generation,"**Table-to-Text Generation** is to generate a description from the structured table. + + +Source: [Key Fact as Pivot: A Two-Stage Model for Low Resource Table-to-Text Generation ](https://arxiv.org/abs/1908.03067)" +natural-language-processing,dialog act classification, +natural-language-processing,legal document translation,Legal document translation is the task of translating legal documents between languages. +natural-language-processing,linguistic acceptability,"Linguistic Acceptability is the task of determining whether a sentence is grammatical or ungrammatical. + +Image Source: [Warstadt et al](https://arxiv.org/pdf/1901.03438v4.pdf)" +natural-language-processing,dialog learning, +natural-language-processing,cross lingual bitext mining,Cross-lingual bitext mining is the task of mining sentence pairs that are translations of each other from large text corpora. +natural-language-processing,intent classification,"**Intent Classification** is the task of correctly labeling a natural language utterance from a predetermined set of intents + + +Source: [Multi-Layer Ensembling Techniques for Multilingual Intent Classification ](https://arxiv.org/abs/1806.07914)" +natural-language-processing,sentence embeddings for biomedical texts, +natural-language-processing,text augmentation,"You can read these blog posts to get an overview of the approaches. + +- [**A Visual Survey of Data Augmentation in NLP**](https://amitness.com/2020/05/data-augmentation-for-nlp/)" +natural-language-processing,open information extraction,"In natural language processing, open information extraction is the task of generating a structured, machine-readable representation of the information in text, usually in the form of triples or n-ary propositions (Source: Wikipedia)." +natural-language-processing,domain labelling, +natural-language-processing,mathematical reasoning, +natural-language-processing,ad hoc information retrieval,Ad-hoc information retrieval refers to the task of returning information resources related to a user query formulated in natural language. +natural-language-processing,arqmath2,Answer Retrieval for Questions about Math v2 (2021) +natural-language-processing,natural language landmark navigation,Generate natural language navigation instructions that revolve around visual landmarks instead of turn-by-turn directives. +natural-language-processing,data mining,"Data mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems (Source: Wikipedia)." +natural-language-processing,dark humor detection, +natural-language-processing,table based fact verification,Verifying facts given semi-structured data. +natural-language-processing,kb to language generation,"Given information from a knowledge base, generate a description of this information in natural language." +natural-language-processing,instruction following, +natural-language-processing,timex normalization,"Temporal expression normalisation is the grounding of a lexicalisation of a time to a calendar date or other formal temporal representation. + +Example: + +10/18/2000 21:01:00.65 +Dozens of Palestinians were wounded in scattered clashes in the West Bank and Gaza Strip, Wednesday, despite the Sharm el-Sheikh truce accord. + +Chuck Rich reports on entertainment every Saturday + +Description from [NLP Progress](http://nlpprogress.com/english/temporal_processing.html)" +natural-language-processing,irish text diacritization,Addition of diacritics for undiacritized Irish Wikipedia texts. +natural-language-processing,relational captioning, +natural-language-processing,relation explanation, +natural-language-processing,cross lingual semantic textual similarity, +natural-language-processing,multimodal gif dialog, +natural-language-processing,drugprot, +natural-language-processing,hungarian text diacritization,Addition of diacritics for undiacritized Hungarian Wikipedia texts. +natural-language-processing,propaganda detection, +natural-language-processing,passage ranking, +natural-language-processing,fact verification,"Fact verification, also called ""fact checking"", is a process of verifying facts in natural text against a database of facts." +natural-language-processing,political salient issue orientation detection, +natural-language-processing,clinical language translation,Translating clinical texts to layperson-understandable language. +natural-language-processing,text summarization,"Shortening a set of data computationally, to create a summary that represents the most important or relevant information within the original content (Source: Wikipedia). + +Image source: [LONG DOCUMENT SUMMARIZATION WITH TOP-DOWN AND BOTTOM-UP INFERENCE](https://arxiv.org/pdf/2203.07586v1.pdf)" +natural-language-processing,medical question pair similarity computation,Predicting whether to questions on medical topics have the same meaning. +natural-language-processing,paraphrase identification,"The goal of **Paraphrase Identification** is to determine whether a pair of sentences have the same meaning. + + +Source: [Adversarial Examples with Difficult Common Words for Paraphrase Identification ](https://arxiv.org/abs/1909.02560) + +Image source: [On Paraphrase Identification Corpora ](http://www.lrec-conf.org/proceedings/lrec2014/pdf/1000_Paper.pdf)" +natural-language-processing,conversational search, +natural-language-processing,abstractive text summarization,"**Abstractive Text Summarization** is the task of generating a short and concise summary that captures the salient ideas of the source text. The generated summaries potentially contain new phrases and sentences that may not appear in the source text. + + +Source: [Generative Adversarial Network for Abstractive Text Summarization ](https://arxiv.org/abs/1711.09357) + +Image credit: [Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond](https://arxiv.org/pdf/1602.06023v5.pdf)" +playing-games,suduko, +playing-games,pass classification, +playing-games,game of shogi, +playing-games,deep sea treasure image version,"Image state version of the multi-objective reinforcement learning toy environment originally introduced in ""Empirical evaluation methods for multiobjective reinforcement learning algorithms"" by P. Vamplew et al." +playing-games,multi agent path finding, +playing-games,score,Mean in-game score over 1000 episodes with random seeds not seen during training. See https://arxiv.org/abs/2006.13760 (Section 2.4 Evaluation Protocol) for details. +playing-games,carracing v0,https://gym.openai.com/envs/CarRacing-v0/ +playing-games,continuous control, +playing-games,openai gym,"An open-source toolkit from OpenAI that implements several Reinforcement Learning benchmarks including: classic control, Atari, Robotics and MuJoCo tasks. + +(Description by [Evolutionary learning of interpretable decision trees](https://paperswithcode.com/paper/evolutionary-learning-of-interpretable)) + +(Image Credit: [OpenAI Gym](https://gym.openai.com/))" +playing-games,nethack,Mean in-game score over 1000 episodes with random seeds not seen during training. See https://arxiv.org/abs/2006.13760 (Section 2.4 Evaluation Protocol) for details. +playing-games,smac plus,Multi-agent reinforcement learning for completion of multi-stage tasks +playing-games,text based games,Text-based games to evaluate the Reinforcement Learning Agents +playing-games,starcraft,"Starcraft I is a RTS game; the task is to train an agent to play the game. + +( Image credit: [Macro Action Selection with Deep Reinforcement Learning in StarCraft](https://arxiv.org/pdf/1812.00336v3.pdf) )" +playing-games,game of cricket, +playing-games,game of hanabi, +playing-games,video games, +playing-games,montezumas revenge,"Montezuma's Revenge is an ATARI 2600 Benchmark game that is known to be difficult to perform on for reinforcement learning algorithms. Solutions typically employ algorithms that incentivise environment exploration in different ways. + +For the state-of-the art tables, please consult the parent Atari Games task. + +( Image credit: [Q-map](https://github.com/fabiopardo/qmap) )" +playing-games,snes games,"The task is to train an agent to play SNES games such as Super Mario. + +( Image credit: [Large-Scale Study of Curiosity-Driven Learning](https://github.com/openai/large-scale-curiosity) )" +playing-games,league of legends, +playing-games,solitaire,A family of single-player games using one or more standard decks of playing cards. +playing-games,game of go,"Go is an abstract strategy board game for two players, in which the aim is to surround more territory than the opponent. The task is to train an agent to play the game and be superior to other players." +playing-games,image outpainting,"Predicting the visual context of an image beyond its boundary. + +Image credit: [NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis](https://paperswithcode.com/paper/nuwa-infinity-autoregressive-over?from=n35)" +playing-games,smac 1,Bechmarks for Efficient Exploration of Completion of Multi-stage Tasks and Usage of Environmental Factors +playing-games,face reconstruction,"3D face reconstruction is the task of reconstructing a face from an image into a 3D form (or mesh). + +( Image credit: [3DDFA_V2](https://github.com/cleardusk/3DDFA_V2) )" +playing-games,control with prametrised actions,"Most reinforcement learning research papers focus on environments where the agent’s actions are either discrete or continuous. However, when training an agent to play a video game, it is common to encounter situations where actions have both discrete and continuous components. For example, a set of high-level discrete actions (ex: move, jump, fire), each of them being associated with continuous parameters (ex: target coordinates for the move action, direction for the jump action, aiming angle for the fire action). These kinds of tasks are included in Control with Parameterised Actions." +playing-games,card games,Card games involve playing cards: the task is to train an agent to play the game with specified rules and beat other players. +playing-games,football action valuation, +playing-games,atari games,"The Atari 2600 Games task (and dataset) involves training an agent to achieve high game scores. + +( Image credit: [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/pdf/1312.5602v1.pdf) )" +playing-games,acrobot,"The acrobot system includes two joints and two links, where the joint between the two links is actuated. Initially, the links are hanging downwards, and the goal is to swing the end of the lower link up to a given height." +playing-games,dota 2,"Dota 2 is a multiplayer online battle arena (MOBA). The task is to train one-or-more agents to play and win the game. + +( Image credit: [OpenAI Five](https://openai.com/five/) )" +playing-games,game of doom,"Doom is an FPS game : the task is typically to train an agent to navigate the game environment, and additionally, acquire points by eliminating enemies. + +( Image credit: [Playing FPS Games with Deep Reinforcement Learning](https://arxiv.org/pdf/1609.05521v2.pdf) )" +playing-games,fps games,"First-person shooter (FPS) games Involve like call of duty so enjoy + +( Image credit: [Procedural Urban Environments for FPS Games](https://arxiv.org/pdf/1604.05791v1.pdf) )" +playing-games,game of chess,"Chess is a two-player strategy board game played on a chessboard, a checkered gameboard with 64 squares arranged in an 8×8 grid. The idea of making a machine that could beat a Grandmaster human player was a fascination in the artificial community for decades. Famously IBM's DeepBlue beat Kasparov in the 1990s. More recently more human-like approaches such as AlphaZero have appeared." +playing-games,procgen hard 100m, +playing-games,board games, +playing-games,starcraft ii,"Starcraft II is a RTS game; the task is to train an agent to play the game. + +( Image credit: [The StarCraft Multi-Agent Challenge](https://arxiv.org/pdf/1902.04043v2.pdf) )" +playing-games,2048, +playing-games,game of football, +playing-games,real time strategy games,"Real-Time Strategy (RTS) tasks involve training an agent to play video games with continuous gameplay and high-level macro-strategic goals such as map control, economic superiority and more. + +( Image credit: [Multi-platform Version of StarCraft: Brood War in a Docker Container](https://github.com/Games-and-Simulations/sc-docker) )" +playing-games,injury prediction, +playing-games,game of poker, +playing-games,dqn replay dataset, +playing-games,klondike,The most commonly played game in the family of Solitaire card games. +playing-games,smac,"The StarCraft Multi-Agent Challenge (SMAC) is a benchmark that provides elements of partial observability, challenging dynamics, and high-dimensional observation spaces. SMAC is built using the StarCraft II game engine, creating a testbed for research in cooperative MARL where each game unit is an independent RL agent." +playing-games,offline rl, +reasoning,human judgment classification,A task where an algorithm judges which sample is better in accordance with human judgment. +reasoning,winowhy, +reasoning,college mathematics, +reasoning,penguins in a table, +reasoning,logical sequence, +reasoning,high school mathematics, +reasoning,mathematical induction,"Tests the language model's capability to understand induction by asking the model to verify the correctness of an induction argument. + +Source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/mathematical_induction)" +reasoning,systematic generalization, +reasoning,elementary mathematics, +reasoning,anachronisms, +reasoning,identify odd metapor, +reasoning,arithmetic reasoning, +reasoning,crash blossom, +reasoning,physical intuition, +reasoning,strategyqa,"StrategyQA aims to measure the ability of models to answer questions that require multi-step implicit reasoning. + +Source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/strategyqa)" +reasoning,analytic entailment, +reasoning,crass ai, +reasoning,decision making,"**Decision Making** is a complex task that involves analyzing data (of different level of abstraction) from disparate sources and with different levels of certainty, merging the information by weighing in on some data source more than other, and arriving at a conclusion by exploring all possible alternatives. + + +Source: [Complex Events Recognition under Uncertainty in a Sensor Network ](https://arxiv.org/abs/1411.0085)" +reasoning,checkmate in one, +reasoning,logical fallacy detection, +reasoning,conformal prediction, +reasoning,model based reinforcement learning, +reasoning,metaphor boolean, +reasoning,analogical similarity, +reasoning,abstract argumentation,Identifying argumentative statements from natural language dialogs. +reasoning,natural language visual grounding, +reasoning,causal identification, +reasoning,commonsense rl,Commonsense reasoning for Reinforcement Learning agents +reasoning,common sense reasoning,"Common sense reasoning tasks are intended to require the model to go beyond pattern +recognition. Instead, the model should use ""common sense"" or world knowledge +to make inferences." +reasoning,entailed polarity, +reasoning,presuppositions as nli, +reasoning,pre election ratings estimation, +reasoning,visual reasoning,Ability to understand actions and reasoning associated with any visual images +reasoning,novel concepts,"Measures the ability of models to uncover an underlying concept that unites several ostensibly disparate entities, which hopefully would not co-occur frequently. This provides a limited test of a model's ability to creatively construct the necessary abstraction to make sense of a situation that it cannot have memorized in training. + +Source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/novel_concepts)" +reasoning,visual entailment,"Visual Entailment (VE) - is a task consisting of image-sentence pairs +whereby a premise is defined by an image, rather than a +natural language sentence as in traditional Textual Entailment tasks. The goal is to predict +whether the image semantically entails the text." +reasoning,decision making under uncertainty, +reasoning,abstract algebra, +reasoning,visual commonsense reasoning,Image source: [Visual Commonsense Reasoning](https://paperswithcode.com/dataset/vcr) +reasoning,temporal sequences,"This task asks models to answer questions about which times certain events could have occurred. + +Source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/temporal_sequences) + +Image source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/temporal_sequences)" +reasoning,causal judgment, +reasoning,program repair,Task of teaching ML models to modify an existing program to fix a bug in a given code. +reasoning,math word problem solving, +reasoning,logical args, +reasoning,reasoning about colored objects, +reasoning,professional accounting, +reasoning,date understanding, +reasoning,odd one out,"This task tests to what extent a language model is able to identify the odd word. + +Source: [BIG-bench](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/odd_one_out)" +reasoning,evaluating information essentiality, +reasoning,navigate, +reasoning,logical reasoning, +reasoning,physical commonsense reasoning, +reasoning,code line descriptions, +reasoning,formal logic, +reasoning,human judgment correlation,A task where an algorithm should generate the judgment scores correlating with human judgments. +reasoning,automated theorem proving,"The goal of **Automated Theorem Proving** is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems. + + +Source: [Learning to Prove Theorems by Learning to Generate Theorems ](https://arxiv.org/abs/2002.07019)" +robots,industrial robots,"An industrial robot is a robot system used for manufacturing. Industrial robots are automated, programmable and capable of movement on three or more axes." +robots,radar object detection,The radar object detection (ROD) task aims to classify and localize the objects in 3D purely from radar's radio frequency (RF) images. +robots,radar odometry,"Radar odometry estimation is the task of estimating the trajectory of the radar sensor, e.g. as presented in https://arxiv.org/abs/2105.01457. +A well established performance metric was presented by Geiger (2012) - ""Are we ready for autonomous driving? the KITTI vision benchmark suite""" +robots,gesture generation,"Generation of gestures, as a sequence of 3d poses" +robots,semantic segmentation, +robots,robotic grasping,This task is composed of using Deep Learning to identify how best to grasp objects using robotic arms in different scenarios. This is a very complex task as it might involve dynamic environments and objects unknown to the network. +robots,drone controller, +robots,sequential place recognition,State-of-the-art algorithms for route-based place recognition under changing conditions. +robots,omniverse isaac gym,"The Omniverse Isaac Gym extension provides an interface for performing reinforcement learning training and inferencing in Isaac Sim. This framework simplifies the process of connecting reinforcement learning libraries and algorithms with other components in Isaac Sim. Similar to existing frameworks and environment wrapper classes that inherit from gym.Env, the Omniverse Isaac Gym extension also provides an interface inheriting from gym.Env and implements a simple set of APIs required by most common RL libraries. This interface can be used as a bridge connecting RL libraries with physics simulation and tasks running in the Isaac Sim framework." +robots,optimal motion planning, +robots,visual navigation,"**Visual Navigation** is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only. + + +Source: [Vision-based Navigation Using Deep Reinforcement Learning ](https://arxiv.org/abs/1908.03627)" +robots,touch detection, +robots,mental stress detection, +robots,marine robot navigation, +robots,image outpainting,"Predicting the visual context of an image beyond its boundary. + +Image credit: [NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis](https://paperswithcode.com/paper/nuwa-infinity-autoregressive-over?from=n35)" +robots,trajectory planning,"Trajectory planning for industrial robots consists of moving the tool center point from point A to point B while avoiding body collisions over time. +Trajectory planning is sometimes referred to as motion planning and erroneously as path planning. Trajectory planning is distinct from path planning in that it is parametrized by time. Essentially trajectory planning encompasses path planning in addition to planning how to move based on velocity, time, and kinematics." +robots,face reconstruction,"3D face reconstruction is the task of reconstructing a face from an image into a 3D form (or mesh). + +( Image credit: [3DDFA_V2](https://github.com/cleardusk/3DDFA_V2) )" +robots,grasp rectangle generation,Grasp rectangles are a popular way to represent a two-finger grasp. Grasp rectangle generation is a task to (automatically) label datasets for parallel-jaw grap learning. +robots,carla map leaderboard,https://leaderboard.carla.org/leaderboard/ +robots,developmental learning, +robots,monocular visual odometry, +robots,robot task planning, +robots,skill generalization,Image credit: [A Generalist Agent](https://storage.googleapis.com/deepmind-media/A%20Generalist%20Agent/Generalist%20Agent.pdf) +robots,sequential place learning,State-of-the-art algorithms for route-based place recognition under changing conditions. +robots,robot navigation,"The fundamental objective of mobile **Robot Navigation** is to arrive at a goal position without collision. The mobile robot is supposed to be aware of obstacles and move freely in different working scenarios. + + +Source: [Learning to Navigate from Simulation via Spatial and Semantic Information Synthesis with Noise Model Embedding ](https://arxiv.org/abs/1910.05758)" +robots,humanoid control,"Control of a high-dimensional humanoid. This can include skill learning by tracking motion capture clips, learning goal-directed tasks like going towards a moving target, and generating motion within a physics simulator." +robots,motion planning,"( Image credit: [Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning](https://arxiv.org/pdf/1805.01956v1.pdf) )" +robots,skill mastery, +robots,vision based navigation with language based,"A grounded vision-language task where an agent with visual perception is guided via language to find objects in photorealistic indoor environments. The task emulates a real-world scenario in that (a) the requester may not know how to navigate to the target objects and thus makes requests by only specifying high-level endgoals, and (b) the agent is capable of sensing when it is lost and querying an advisor, who is more qualified at the task, to obtain language subgoals to make progress." +robots,pointgoal navigation, +robots,robot manipulation, +robots,safe exploration,"**Safe Exploration** is an approach to collect ground truth data by safely interacting with the environment. + + +Source: [Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems ](https://arxiv.org/abs/2005.04374)" +robots,deformable object manipulation, +robots,joint radar communication,Intelligently decide how to simultaneously conduct radar and communication over a shared radio channel. +robots,vision and language navigation, +robots,d4rl, +robots,isaac gym preview,Isaac Gym offers a high performance learning platform to train policies for wide variety of robotics tasks directly on GPU. Both physics simulation and the neural network policy training reside on GPU and communicate by directly passing data from physics buffers to PyTorch tensors without ever going through any CPU bottlenecks. This leads to blazing fast training times for complex robotics tasks on a single GPU with 2-3 orders of magnitude improvements compared to conventional RL training that uses a CPU based simulator and GPU for neural networks. +robots,visual odometry,"**Visual Odometry** is an important area of information fusion in which the central aim is to estimate the pose of a robot using data collected by visual sensors. + + +Source: [Bi-objective Optimization for Robust RGB-D Visual Odometry ](https://arxiv.org/abs/1411.7445)" +speech,audio visual speech recognition,Audio-visual speech recognition is the task of transcribing a paired audio and visual stream into text. +speech,noisy speech recognition, +speech,speech synthesis marathi, +speech,speaker separation, +speech,emotional speech synthesis, +speech,expressive speech synthesis, +speech,multimodal emotion recognition,"This is a leaderboard for multimodal emotion recognition on the IEMOCAP dataset. The modality abbreviations are +A: Acoustic +T: Text +V: Visual + +Please include the modality in the bracket after the model name. + +All models must use standard five emotion categories and are evaluated in standard leave-one-session-out (LOSO). See the papers for references." +speech,robust speech recognition, +speech,image to image translation,"Image-to-image translation is the task of taking images from one domain and transforming them so they have the style (or characteristics) of images from another domain. + +( Image credit: [Unpaired Image-to-Image Translation +using Cycle-Consistent Adversarial Networks](https://arxiv.org/pdf/1703.10593v6.pdf) )" +speech,acoustic unit discovery, +speech,multi speaker source separation, +speech,manner of articulation detection, +speech,speech synthesis malayalam, +speech,word level pronunciation scoring,Total score of a word pronunciation. +speech,accented speech recognition, +speech,speaking style synthesis, +speech,speech synthesis assamese, +speech,spoken dialogue systems, +speech,text to speech synthesis,Converting written text in natural language to speech. +speech,distant speech recognition, +speech,voice conversion,"**Voice Conversion** is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information. + + +Source: [Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet ](https://arxiv.org/abs/1903.12389)" +speech,speech synthesis rajasthani, +speech,speech synthesis bengali, +speech,acoustic echo cancellation, +speech,text independent speaker recognition, +speech,speech recognition,"Speech recognition is the task of recognising speech within audio and converting it into text. + +( Image credit: [SpecAugment](https://arxiv.org/pdf/1904.08779v2.pdf) )" +speech,speech dereverberation,Removing reverberation from audio signals +speech,speech synthesis hindi, +speech,speech to gesture translation, +speech,speech synthesis gujarati, +speech,text independent speaker verification, +speech,speaker profiling,Estimation of Physical parameters from Speech data +speech,pronunciation assessment, +speech,speech enhancement,"Speech enhancement is the task of taking a noisy speech input and producing an enhanced speech output. + +( Image credit: [A Fully Convolutional Neural Network For Speech Enhancement](https://arxiv.org/pdf/1609.07132v1.pdf) )" +speech,sequence to sequence speech recognition, +speech,speech emotion recognition,"Categorical speech emotion recognition. +Emotion categories: Happy (+ excitement), Sad, Neutral, Angry +Modality: Speech Only + +For multimodal emotion recognition, please upload your result to [Multimodal Emotion Recognition on IEMOCAP](https://paperswithcode.com/sota/multimodal-emotion-recognition-on-iemocap)" +speech,speech synthesis manipuri, +speech,speech editing,"I first learned this story from my third choice, ie, my teacher who I used to call master. That was supposed to be a life- changing tale for me because I was very stubborn and unreceptive back then. But, my master taught me to be more open with new perspectives and continue to seek inspirations from other people who I can call masters, too, and to absorb and just filter later. As Bruce Lee said. ""Absorb what is useful"" Hopefully, after have taken everything in, I will have evolved into a better educator Just like my master and ultimately, a better creative person want to reach that ""zen point where everything is intuitive and instinctive, where teaching and are one (like the samural and the sword are one), where I can see beyond what my eyes tell me as what swordsman Miyamoto Musashi said. + +Yes. I am aware of the dangers of having too many masters. But mixed martial arts taught us that we can learn different fighting styles from different masters, and eventually, evolve into a well-rounded warrior. I guess the secret lies in keeping an open mind. I learned that from my master. So, just make sure that when meet other people and listen to their stories, go with an empty cup. + +Nevertheless, she left me. Again, it broke my heart. + +Right after signed on my journal entry, Theard euphonous voices of these three personalities fused into one calling my name. It was my mom She came in to my room with two pieces of cake each shaped with letters P and Jenough to be carried by her hands. The letters are initials of name- Philippe John. Planted on the edge my first of each cake were five tiny well-lit candles. I stood from my post, grabbed the pieces from my mom's shaky hands, and put them on my desk. Then, I hugged her it was one of the tightest hugs had given her. And, she told me ""You're now a decade young teacher. Way to go, my love, and promise I will not leave you anymore. Never"" + +I couldn't thank her more. May 15 of this year, woke up with a happy heart. And. again. thought to myself, ""when reach 50 years old, 60 or beyond, I will look back to this day again and again and again." +speech,acoustic modelling, +speech,speaker recognition,"**Speaker Recognition** is the process of identifying or confirming the identity of a person given his speech segments. + + +Source: [Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition ](https://arxiv.org/abs/1906.07317)" +speech,speaker identification, +speech,talking face generation,"Talking face generation aims to synthesize a sequence of face images that correspond to given speech semantics. + + +( Image credit: [Talking Face Generation by Adversarially Disentangled Audio-Visual Representation](https://github.com/Hangz-nju-cuhk/Talking-Face-Generation-DAVS) )" +speech,speech separation,"The task of extracting all overlapping speech sources in a given mixed speech signal refers to the **Speech Separation**. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study. + + +Source: [A Unified Framework for Speech Separation ](https://arxiv.org/abs/1912.07814) + +Image credit: [Speech Separation of A Target Speaker Based on Deep Neural Networks](http://staff.ustc.edu.cn/~jundu/Publications/publications/ICSP2014_Du.pdf)" +speech,english conversational speech recognition, +speech,speech synthesis kannada, +speech,speech synthesis,"Speech synthesis is the task of generating speech from some other modality like text, lip movements etc. + +Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk. + +( Image credit: [WaveNet: A generative model for raw audio](https://deepmind.com/blog/article/wavenet-generative-model-raw-audio) )" +speech,small footprint keyword spotting, +speech,speech synthesis odia, +speech,speaker diarization,"**Speaker Diarization** is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription. + + +Source: [Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm ](https://arxiv.org/abs/1910.11691)" +speech,spoken command recognition, +speech,bandwidth extension,Bandwidth extension is the task of expanding the bandwidth of a signal in a way that approximates the original or desired higher bandwidth signal. +speech,speech synthesis telugu, +speech,speech synthesis tamil, +speech,visual speech recognition, +speech,text dependent speaker verification, +speech,phone level pronunciation scoring, +speech,utterance level pronounciation scoring,Total pronunciation score of an utterance. +speech,dialogue generation,"Dialogue generation is the task of ""understanding"" natural language inputs - within natural language processing in order to produce output. The systems are usually intended for conversing with humans, for instance back and forth dialogue with a conversation agent like a chatbot. Some example benchmarks for this task (see others such as Natural Language Understanding) include FusedChat and Ubuntu DIalogue Corpus (UDC). Models can be evaluated via metrics such as BLEU, ROUGE, and METEOR albeit with challenges in terms of weak correlation with human judgement, that may be addressed by new ones like UnSupervised and Reference-free (USR) and Metric for automatic Unreferenced dialog evaluation (MaUde)." +speech,speaker verification,"Speaker verification is the verifying the identity of a person from characteristics of the voice. + +( Image credit: [Contrastive-Predictive-Coding-PyTorch +](https://github.com/jefflai108/Contrastive-Predictive-Coding-PyTorch) )" +speech,acoustic question answering, +speech,speech synthesis bodo, +speech,spoken language understanding, +speech,keyword spotting,"In speech processing, keyword spotting deals with the identification of keywords in utterances. + +( Image credit: [Simon Grest](https://github.com/simongrest/kaggle-freesound-audio-tagging-2019) )" +speech,speech to speech translation,"Speech-to-speech translation (S2ST) consists on translating speech from one language to speech in another language. This can be done with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis +sub-systems, which is text-centric. Recently, works on S2ST without relying on intermediate text representation is emerging." +speech,speech denoising,Obtain the clean speech of the target speaker by suppressing the background noise. +speech,unsupervised speech recognition, +speech,spoken language identification,Identify the language being spoken from an audio input only. +speech,speech extraction, +speech,voice cloning, +speech,voice query recognition, +speech,automatic speech recognition, +time-series,solar irradiance forecasting, +time-series,episode classification,The episode classification is a branch of the classification aiming to classify groups of observations of a Time Series. (Example: critical episodes/ normal episode) +time-series,predictive process monitoring,A branch of predictive analysis that attempts to predict some future state of a business process. +time-series,classification on time series with missing, +time-series,exponential degradation,Exponential degradation used to solve problems where systems exposed to an exponential loss in performances such as reparable industrial systems. +time-series,sleep spindles detection, +time-series,change point detection,Change point detection is concerned with the accurate detection of abrupt and significant changes in the behavior of a time series. +time-series,tropical cyclone intensity forecasting, +time-series,time series clustering,"**Time Series Clustering** is an unsupervised data mining technique for organizing data points into groups based on their similarity. The objective is to maximize data similarity within clusters and minimize it across clusters. Time-series clustering is often used as a subroutine of other more complex algorithms and is employed as a standard tool in data science for anomaly detection, character recognition, pattern discovery, visualization of time series. + + +Source: [Comprehensive Process Drift Detection with Visual Analytics ](https://arxiv.org/abs/1907.06386)" +time-series,phonocardiogram classification,Classify labels/murmur/clinical outcome based on Phonocardiograms (PCGs) +time-series,traffic prediction,"Traffic prediction is the task of predicting traffic volumes, utilising historical speed and volume data. + +( Image credit: [BaiduTraffic](https://github.com/JingqingZ/BaiduTraffic) )" +time-series,short observation new product sales, +time-series,ecg based sleep staging,Sleep Staging from only ECG signal +time-series,human behavior forecasting, +time-series,time series alignment, +time-series,tropical cyclone track forecasting, +time-series,time series denoising, +time-series,social media popularity prediction,"Social Media Popularity Prediction (SMPP) aims to predict the future popularity (e.g., clicks, views, likes, etc.) of online posts automatically via plenty of social media data from public platforms. It is a crucial problem for social media learning and forecasting and one of the most challenging problems in the field. With the ever-changing user interests and public attention on social media platforms, how to predict popularity accurately becomes more challenging than before. This task is valuable to content providers, marketers, or consumers in a range of real-world applications, including multimedia advertising, recommendation system, or trend analysis." +time-series,human motion prediction,"Action prediction is a pre-fact video understanding task, which focuses on future states, in other words, it needs to reason about future states or infer action labels before the end of action execution." +time-series,imputation,Substituting missing data with values according to some criteria. +time-series,semi supervised time series classification, +time-series,time series streams, +time-series,stock prediction, +time-series,human activity recognition,Classify various human activities +time-series,sequential skip prediction, +time-series,edge computing,Deep Learning on EDGE devices +time-series,time series classification,"**Time Series Classification** is a general task that can be useful across many subject-matter domains and applications. The overall goal is to identify a time series as coming from one of possibly many sources or predefined groups, using labeled training data. That is, in this setting we conduct supervised learning, where the different time series sources are considered known. + + +Source: [Nonlinear Time Series Classification Using Bispectrum-based Deep Convolutional Neural Networks ](https://arxiv.org/abs/2003.02353)" +time-series,lip password classification,A classification task that predicts whether the designated user is uttering the designated password. +time-series,eeg based sleep staging,Sleep staging from only EEG signal +time-series,time series prediction,"The goal of **Time Series Prediction** is to infer the future values of a time series from the past. + + +Source: [Orthogonal Echo State Networks and stochastic evaluations of likelihoods ](https://arxiv.org/abs/1601.05911)" +time-series,unsupervised spatial clustering, +time-series,semanticity prediction,"T3: Semanticity Prediction: Estimating the semanticity perceived by Listener from physiological signals (EEG, GSP, PPG). Label: 0-(semantic), 1-(non-semantic). Binary classification problem." +time-series,irregular time series,Irregular Time Series +time-series,w r n sleep staging,"3-class Sleep Staging into +- Wake +- Rem +- NREM" +time-series,algorithmic trading,An algorithmic trading system is a software that is used for trading in the stock market. +time-series,energy management,"energy management is to schedule energy units inside the systems, enabling an reliable, safe and cost-effective operation" +time-series,stock price prediction, +time-series,predict clinical outcome,"A cost-based metric that considers the costs of algorithmic prescreening, expert screening, treatment, and diagnostic errors that result in late or missed treatments. This metric is further described here: https://moody-challenge.physionet.org/2022/" +time-series,time to event prediction, +time-series,lwr classification,"T4: LWR Classification: Predicting if the subject is Listening, Writing, or Resting from physiological signals (EEG, GSR, PPG). Labels: 0-listening, 1-writing, 2-resting. Classification tasks." +time-series,time series,"Time series deals with sequential data where the data is indexed (ordered) by a time dimension. + +( Image credit: [Autoregressive CNNs for Asynchronous Time Series](https://arxiv.org/pdf/1703.04122v4.pdf) )" +time-series,univariate time series forecasting, +time-series,time series analysis, +time-series,stock market prediction, +time-series,social cue forecasting, +time-series,multivariate time series forecasting, +time-series,spatio temporal forecasting, +time-series,eeg decoding,**EEG Decoding** - extracting useful information directly from EEG data. +time-series,new product sales forecasting,"Sales forecasting of new product, which the market hasn’t seen +before." +time-series,attention score prediction,"Auditory Attention Score Prediction: Estimating the attention level of Listener from physiological signals (EEG, GSR, PPG), a regression task. The attention score ranges from 0 to 100." +time-series,solar flare prediction,Solar flare prediction in heliophysics +time-series,trajectory prediction,"**Trajectory Prediction** is the problem of predicting the short-term (1-3 seconds) and long-term (3-5 seconds) spatial coordinates of various road-agents such as cars, buses, pedestrians, rickshaws, and animals, etc. These road-agents have different dynamic behaviors that may correspond to aggressive or conservative driving styles. + + +Source: [Forecasting Trajectory and Behavior of Road-Agents Using Spectral Clustering in Graph-LSTMs ](https://arxiv.org/abs/1912.01118)" +time-series,noise level prediction,"T2: Noise Level Prediction: Estimating the noise level experienced by the Listener from physiological signals (EEG, GSR, PPG). Six different levels of background noise (SNR) +Label: -6, -3, 0, 3, 6, and inf (noise-free) in dB." +time-series,covid 19 tracking, +time-series,fire detection,Detection of fire using multi-variate time series sensor data. +time-series,moving point cloud processing, +time-series,dynamic time warping, +time-series,multivariate time series imputation, +time-series,math word problem solving, +time-series,classify murmurs,Classify murmurs based on Phonocardiograms (PCGs) +time-series,covid 19 modelling, +time-series,trajectory modeling,The equivalent of language modeling but for trajectories. +time-series,time series anomaly detection, +time-series,traffic data imputation, +time-series,portfolio optimization, +time-series,data compression, +time-series,time series forecasting,"**Time series forecasting** is the task of fitting a model to historical, time-stamped data in order to predict future values. Traditional approaches include moving average, exponential smoothing, and ARIMA, though models as various as RNNs, Transformers, or XGBoost can also be applied. The most popular benchmark is the ETTh1 dataset. Models are typically evaluated using the Mean Square Error (MSE) or Root Mean Square Error (RMSE). + +( Image credit: [ThaiBinh Nguyen](https://github.com/tn16jv/Stock-Price-Prediction) )" +time-series,image inpainting,"**Image Inpainting** is a task of reconstructing missing regions in an image. It is an important problem in computer vision and an essential functionality in many imaging and graphics applications, e.g. object removal, image restoration, manipulation, re-targeting, compositing, and image-based rendering. + +Source: [High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling ](https://arxiv.org/abs/2005.11742) + +Image source: [High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling](https://arxiv.org/pdf/2005.11742.pdf)" +time-series,multimodal association,Associate identities across different modalities. +time-series,time series averaging, +time-series,clustering multivariate time series, +time-series,time series regression,Predicting one or more scalars for an entire time series example. +time-series,eeg,"Electroencephalogram (EEG) is a method of recording brain activity using electrophysiological indexes. When the brain is active, a large number of postsynaptic potentials generated synchronously by neurons are formed after summation. It records the changes of electric waves during brain activity and is the overall reflection of the electrophysiological activities of brain nerve cells on the surface of cerebral cortex or scalp. Brain waves originate from the postsynaptic potential of the apical dendrites of pyramidal cells. The formation of synchronous rhythm of EEG is also related to the activity of nonspecific projection system of cortex and thalamus. EEG is the basic theoretical research of brain science. EEG monitoring is widely used in its clinical application." +time-series,probabilistic time series forecasting, +time-series,stock trend prediction, +time-series,w r l d sleep staging,"4-class Sleep Staging into +- Wake +- REM +- LIGHT SLEEP +- DEEP SLEEP" +time-series,video quality assessment, +time-series,intelligent communication,"Intelligently decide (i) the content of data +to be shared/communicated and (ii) the direction in which the chosen +data is transmitted." +time-series,sequential bayesian inference,"Also known as Bayesian filtering or [recursive Bayesian estimation](https://en.wikipedia.org/wiki/Recursive_Bayesian_estimation), this task aims to perform inference on latent state-space models." +time-series,non intrusive load monitoring, +time-series,seismic source localization,Locating a seismic source using seismometer recordings +time-series,activity prediction,Predict human activities in videos +time-series,remaining useful lifetime estimation,Estimating the number of machine operation cycles until breakdown from the time series of previous cycles. +time-series,earth surface forecasting,Conditional forecasting of future multi-spectral imagery.