--- license: apache-2.0 inference: parameters: do_sample: False max_new_tokens: 250 widget: - text: "summarize: Me [19 F] with my friend [19 M], not sure if I may have messed things up already.\nHello hello everybody. I hope this isn't too trivial of a question to ask on here, but I've been feeling a bit out of my depth when it comes to this situation (I've had only one relationship before, and for many reasons, it was out of the ordinary).\n\nOkay! So, a couple of weeks ago, I started talking to this guy on Facebook, through a student group that we were both part of. I thought he was sort of cute, so I sent him a PM just to talk, etc, etc. We're both transfer students at the same school, so I knew that we could eventually meet in person once we both moved on-campus. So, we did, and we hung out maybe twice, just as friends.\n\nOkay. So, everything is going pretty well. We talk over Facebook and Snapchat, whatever. So, Saturday night, I was just hanging out with people and kind of being bored, when I got a Snapchat from him asking what I was doing. I asked if he wanted to hang out, so we did. \n\nWe ended up smoking pot (the first time for me, ever), and sort of just wandering around. Eventually we ended up back at his dorm room, where high me decided to just go for it, and I came on to him pretty strongly. It worked out for me (luckily, otherwise things would have been really super awkward), and we ended up messing around but not having sex.\n\nYesterday, however, I ended up going to hang out with him again, and this time we did sleep together. Afterward, we kind of discussed what we were going to do, and he just said that he wanted to \"play it by ear\" and not slap any labels on anything. I'm wondering if this means that he wants a fwb-type situation, or if he might actually be interested in me. The way I've been acting is extremely out of character for me, and I am not interested in having a fuck buddy. I like him, and I would be very interested in maybe seeing where things go, but I'm worried that I may have ruined my chances of a relationship by sleeping with him already." example_title: "Example 1" - text: "summarize: My 11 year old sons friend died suddenly, his funeral is today and my son suddenly doesn't want to attend.\n**repost from relationships**\n\nA couple of weeks ago my sons friend died in a freak accident, it was completely shocking and horrific. He isn't aware of the details, but we broke the news to him as soon as we found out and have spoke about it many times with him.\n\nHe has cried about it, asked questions and spoken about it with his older siblings (who have also recently lost a friend) and seemed to be okay with it (considering the circumstances).\n\nLeading up to the funeral, we have talked about it and explained what he is to expect, etc. This is his first funeral, so we have made sure that he is aware of everything.\n\nBut today is the day, and he has broken down in the morning and says he doesn't want to go. I have no idea what to do. Do I push him to go? I am worried that he will regret it later, as he is a kid who doesn't like to do new things. But at the same time, I don't want to pressure him to do something he doesn't want to do.\nAdvice, please?" example_title: "Example 2" - text: "summarize: The girl [26 F] I [22 M] have been seeing for a month didn't respond to me at all yesterday while hanging out with a friend [~30? M].\nShe gets terrible service while at her house, but I texted her 3 times yesterday, 4-5 hours apart. She didn't call me until early this morning and left a voicemail that she was busy all day with a friend who showed up out of the blue.\n\nI saw that she posted a picture of the two of them out of her dead zone house on facebook before I texted her the last time.\n\nI don't mind that she hangs out with friends, and I know it's pretty early in the relationship, but am I wrong to be a little annoyed that she didn't respond until 24 hours after my first text?" example_title: "Example 3" - text: "summarize: TIFU by accidently kicking an old woman\nSo this didn't happen today but actually about a year or two ago.\n\nI was at my granddads funeral so of course it was all very sad and full of lots of crying old people. After the ceremony everyone walks outside the building and onto the other side of the small road the hearses drive down. Now the road is important because obviously if there's a road, there's a curb onto the sidewalk, so most of us are on the other side of the road, besides a few older people walking a lot slower. \n\nAs one of the old woman goes to walk up the curb she trips (obviously didn't notice there was one due to crying and whatnot) and I'm the only one who not only sees it coming but is in any position to do anything. So of course as someone who is an avid football (soccer if you're American) player my first instinct is to stick my foot out and kind of control her head like you would with a football.\n\nOf course you can imagine this looked horrendously bad on my part (quite literally kicking an old woman while she's down) and as she got up everyone noticed that her nose was completely grey/black as if she'd just been punched in the face. She assures us she's fine and we go to the dinner afterwards where someone finally informs her of her bruising. She goes to the toilet and comes out with a completely normal looking nose so of course everyone's wondering how and it turns out that the \"bruising\" was actually shoe polish from my shoe, confirming to everyone that i kicked this poor old lady square in the face as she fell." example_title: "Example 4" --- # GPT-2 Large RLHF Model for OpenAI TLDR Summarization Based on SFT model ellipseai/gpt2-large-tldr-sum and perform RLHF training for better human alignment. The training curve on validation reward is ![Validation reward during RL training](resource/validation_reward.png) We perform evaluation for SFT model and RL model on 386 test set on the summarization quality with Claude-v2 to judge winner. We observe that RL model is significantly better than SFT model. This demonstrate that RL training is work very well. |model | win | loss | tie | win rate | loss rate | win rate adjusted | |---|---|---|---|---|---|---| |ellipseai/gpt2-large-tldr-sum | 24 | 151 | 211 | 0.0622 | 0.3911 | 33.55% | |ellipseai/gpt2-large-tldr-sum-rlhf | 151 | 24 | 211 | 0.3911 | 0.0622 | 66.45% |