Simple low-code baseline with sentence transformer and catboost
#6
by
tomgxt
- opened
Just a quick test of how far we can get in Graphext using pretrained embeddings as input to a simple classifier. Gets to about 38% accuracy in 10 min.
concatenate(ds.movie_name, ds.synopsis, {"separator": ". "}) => (ds.text)
embed_text_with_model(ds.text, {
"collection": "SBERT",
"name": "all-mpnet-base-v2"
}) -> (ds.embedding)
train_classification(ds[["embedding", "genre"]], {
"target": "genre",
"model": "CatboostClassifier",
"encode_features": false,
"params": {
"iterations": 750,
"rsm": 0.1
},
"validate": {
"n_splits": 1,
"test_size": 0.2
}
}) -> (ds.predicted, ds.probs, "genre-model")