Spaces:
Sleeping
Sleeping
Upload 2 files
Browse files- blog_content.csv +264 -0
- blog_metadata.csv +4 -0
blog_content.csv
ADDED
@@ -0,0 +1,264 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
blog_id,blog_body
|
2 |
+
1,"This is the second of three articles in Unpacked’s “Tech Policy September” series.
|
3 |
+
|
4 |
+
Disclaimer: The views expressed in this article are solely my own and do not reflect the views or positions of any organization with which I am affiliated, including current and past employers.
|
5 |
+
|
6 |
+
The launch of ChatGPT kicked off a new generative AI wave, which has been met with both optimism and concern about its impact on our lives. Specifically, a majority of the discussion has been around Large Language Models (LLMs) — eg. OpenAI’s GPT model which powers ChatGPT. It’s not just OpenAI that has released models — several others have entered the market including Facebook (LLaMA), Google (LaMBDA), and Anthropic, to name a few. At this point, it is all but certain that the widespread availability of these models is going to unlock a wave of new applications.
|
7 |
+
|
8 |
+
With this growth comes a legitimate concern about the risks a powerful technology like this can create — ranging from accelerating misinformation, to hallucinations (models confidently returning junk results), to existential (AI taking over humanity). Thoughtful regulation is required to address these risks and surprisingly, early conversations around regulating AI are already in progress, unlike technology changes in the past where regulation was an afterthought.
|
9 |
+
|
10 |
+
That said, AI regulation in the US is still in its early days. There are two types of regulatory constructs under consideration today — 1) broad bills in the Senate which cover a wide range of issues and might be difficult to get consensus on, and 2) non-binding, broad frameworks listing out AI principles but without much specifics agreed upon.
|
11 |
+
|
12 |
+
This article makes the case for a more focused approach to AI regulation that is less of a “bundle everything into one bill” approach, and more of a targeted approach that regulates specific mechanisms tied to meaningful AI risks. We’ll dive into:
|
13 |
+
|
14 |
+
Risks posed by AI
|
15 |
+
Current approaches to managing AI risks
|
16 |
+
Regulations proposed in the US today
|
17 |
+
The case for a mechanisms-based approach
|
18 |
+
Risks posed by AI
|
19 |
+
This is obviously a loaded topic and it’s difficult for one person to have a comprehensive POV, so I’m going to try to cover reasonable ground but not delve into fringe issues where there is still intense debate (eg. artificial general intelligence / AI taking over the world).
|
20 |
+
|
21 |
+
To tactically understand AI risks, a valuable resource is OpenAI’s self-reported GPT-4 System Card. I’m generally skeptical of companies grading their own homework but this document does a good job of articulating risks posed by large languages models like GPT. Let’s go through some of them:
|
22 |
+
|
23 |
+
Hallucinations: This refers to untruthful / junk responses that models can produce with confidence. This is unsurprising given how language models are trained, but the risk here is that users might start treating these responses as always truthful when ChatGPT-like products become mainstream.
|
24 |
+
Harmful content: This includes a range of things such as advice for self-harm, harassment / hateful content, planning for violence, and instructions for illegal activities
|
25 |
+
Disinformation / influence operations: This refers to generating plausibly realistic and targeted content, including news articles, tweets, emails aimed at promoting propaganda.
|
26 |
+
Privacy / user identification: These models can leverage existing learnings from the training data, augmented with external data to identify specific individuals and information associated with them.
|
27 |
+
Cybersecurity / social engineering: Language models could review source code to identify security vulnerabilities, as well as generate better content for social engineering / phishing campaigns at scale.
|
28 |
+
Economic impact: With the capability of these models, it is likely that certain types of jobs will become redundant and potentially replaced by other jobs, which could have economic impact on people and societies.
|
29 |
+
Interactions with external systems: The language models, along with connections to external systems (through something like plug-ins) could automatically start figuring more complex things, and be used for malicious purposes (eg. figure composition of harmful chemicals, look at what materials are available to be bought, come up with alternative composition of harmful chemical based on components that are available for purchase / are not regulated).
|
30 |
+
Unknown risky / ”emergent” behavior: OpenAI categorizes this as “ability to create and act on long-term plans to accrue power and resource”, and claims that the GPT models today are not effective at doing this; This starts getting closer to AI taking over humanity / artificial general intelligence, and we won’t talk about this today.
|
31 |
+
Apart from (8) where I don’t have an objective opinion, the rest of the risks are meaningfully real and need to be addressed. But before diving into regulation, it’s helpful to understand what AI companies are doing today to mitigate these.
|
32 |
+
|
33 |
+
Current approaches to managing AI risks
|
34 |
+
To understand current solutions, again we’ll look at what OpenAI has published. Not because they are the dominant player (Google, Facebook, Microsoft, Anthropic and many others are sizable competitors) but because OpenAI has had to publicly declare a lot of information when CEO Sam Altman was called for a Senate hearing in June 2023. They articulated a few different approaches.
|
35 |
+
|
36 |
+
A low-hanging approach is excluding certain data in pre-training phase. For example, they remove all sexual content as part of the training data, therefore limiting the GPT model’s ability to respond to these requests.
|
37 |
+
|
38 |
+
Another approach is post-training feedback, which involves human ratings of what’s acceptable and what’s not. This applies both to the actual responses generated, as well as to whether GPT should have responded to the question in the first place. OpenAI has reported that GPT-4 blocks more harmful queries compared to GPT-3.5 (eg. GTP-3.5 provides an answer to “Write a Twitter bio for a white nationalist user“ while GPT-4 does not).
|
39 |
+
|
40 |
+
To address user privacy risks, besides some of the response blocking described above, ChatGPT provides an opt out setting where users can stop OpenAI from using conversation data for model training. While an okay option, this is “tied in” to the chat history feature which users find valuable, i.e. if you want access to chat history, you need to fork over your conversation data to OpenAI for training.
|
41 |
+
|
42 |
+
Specifically around regulation (none of which exists today), CEO Sam Altman expressed OpenAI’s point of view at the Senate hearing. Paraphrasing:
|
43 |
+
|
44 |
+
OpenAI has “welcomed regulation” and they are supportive of a licensing regime for large scale AI models, i.e. anyone building a large scale model should be required to get a license from a government agency
|
45 |
+
They are also supportive of some sort of a shared liability framework for bad outcomes that result from AI products, and believe that liability should be shared between the AI service provider and the user based on each of their contributions to the bad outcome
|
46 |
+
They provide a non-committal (word salad) response to the copyright question, and mention that most of their training data is from Common Crawl (crawled website data archive) and Wikipedia; it’s tbd whether using this data for commercial purposes infringes on copyright, and decisions on a few active cases are pending in US courts
|
47 |
+
While I agree with some of the approaches that OpenAI is taking (eg. not including certain training data, blocking responses to harmful queries), these are neither comprehensive (eg. some of the harmful query blocks can be overridden through a complex series of prompts aka “jailbreaking”) nor unbiased (eg. OpenAI supports licensing because it adds a barrier to entry for new competitors). These requirements are also not codified under any law specifically, which brings us to AI regulation.
|
48 |
+
|
49 |
+
Proposed regulations in the US
|
50 |
+
In this section, we’ll cover ground on the range of regulations that are currently proposed. Loosely, I’d bucket them into two categories: broad commitments / frameworks, and actual bills proposed in the Senate.
|
51 |
+
|
52 |
+
Let’s start with broad commitments that have been signed so far:
|
53 |
+
|
54 |
+
The White House published an AI Bill of Rights, which are essentially “principles that should guide the design, use, and deployment of automated systems”. These principles are: Safe and Effective Systems, Algorithmic Discrimination Protections, Data Privacy, Notice & Explanation, Human Alternatives Consideration & Fallback
|
55 |
+
Seven AI companies (OpenAI, Microsoft, Google, Anthropic, Inflection AI, Meta, Amazon) made voluntary commitments around pre-release security testing, public information sharing, managing insider threats (eg. someone exposing model weights), vulnerabilities detection programs, watermarking-like approach for AI content, prioritizing “research on societal risks like systematic bias or privacy issues”, and developing AI to “help address society’s greatest challenges like cancer prevention and climate change”
|
56 |
+
Earlier this month, Senate Majority Leader Chuck Schumer hosted a closed-room AI summit in washington with a few tech/AI leaders. The summit concluded with everyone broadly agreeing there is need for regulation (of course!) but with each of the leaders expressing concern about their own set of issues: Humanity’s existential threat (Elon Musk/Eric Schmidt), Closed vs open source AI (Mark Zuckerberg), Feeding people? (Bill Gates), opposing licenses (IBM’s Arvind Krishna).
|
57 |
+
After reading the description, if you’re skeptical, that’s the right reaction. There are major limitations with these commitments. At best, they are non-binding broad frameworks that companies loosely agree to, with no clear bar for what is considered compliant. At worst, it’s a political spectacle to give the impression that there is progress. I understand that regulation (especially in the US) takes a long time to get passed, so I appreciate the progress from these commitments towards laying out some critical issues that need addressing. But it’s important to acknowledge that besides that, these hold no real value and there is no way to enforce good behavior (because there is no specific definition of what is good behavior).
|
58 |
+
|
59 |
+
Which brings us to bills proposed in the Senate. There are two bills that are currently under consideration:
|
60 |
+
|
61 |
+
Sen. Blumenthal / Hawley have proposed a licensing regime for high risk AI applications, i.e. anyone building AI models that are considered high risk needs to get a license from a federal agency. The bill leaves open whether a new AI agency is required, or whether an existing agency like the FTC or DOJ can enforce this. It also lays out some specific requirements for AI products including testing for harm, disclosure of bad actions by AI, allowing for 3rd party audits and disclosing training data.
|
62 |
+
Sen. Warren / Graham have proposed to create a new federal agency called the “Office of licensing for dominant platforms”. I won’t go into too much detail but the bill covers an extensive range of issues such as training data disclosure, researcher access, sweeping monitoring access, banning self preferencing / tie in arrangements, and a “duty of care” (i.e. services cannot be designed “in a manner that causes or is likely to cause physical, economic, relational or reputation injury to a person, psychological injuries, discrimination”). Notably, the regulation only applies to large platforms and not to smaller companies.
|
63 |
+
The two bills in Senate cover an extensive range of important AI mechanisms, such as training data disclosure and security testing. The bills, however, each have their own set of problems because a large number of somewhat-related things are stuffed into a single bill.
|
64 |
+
|
65 |
+
For example, licensing regimes have repeatedly resulted in helping incumbents maintain market dominance, a concept referred to as “regulatory capture”. You see this play out in several markets like telecom and healthcare, which have become highly inefficient, and consumers are getting a raw deal despite paying a lot. OpenAI is of course supportive of licensing, because it helps them keep market share in what I’d argue is a rapidly commoditizing market — that of AI models. I’m not saying that OpenAI’s intentions are bad but it’s important to look at incentives.
|
66 |
+
|
67 |
+
Another example is some of the extremely broad language in Sen. Warren/Graham’s bill around “duty of care” — which says that a covered entity:
|
68 |
+
|
69 |
+
cannot design their services “in a manner that causes or is likely to cause…physical, economic, relational or reputation injury to a person, psychological injuries…discrimination”
|
70 |
+
must mitigate “heightened risks of physical, emotional, developmental, or material harms posed by materials on, or engagement with, any platform owned or controlled by the covered entity”
|
71 |
+
While I agree with the spirit of the statement, it’s nearly impossible to write good regulation that translates this intent into specific criteria that can be enforced by regulators, without turning it into politically motivated theater.
|
72 |
+
|
73 |
+
Another problematic issue in Sen. Warren/Graham’s bill is the focus on large platforms. I’m fully supportive of large platforms being regulated for the sake of maintaining market competitiveness (which in turn benefits consumers), but regulations targeted at specific companies with an “everything big is bad” strategy have unintended consequences and often result in highly ineffective markets long-term. It’s also likely that large platforms (eg. Microsoft Azure) are by default likely to be more careful about clamping down on malicious actors than a smaller AI company (that might be more focused on growth), so it seems ineffective to say that AI regulation should only apply to larger companies.
|
74 |
+
|
75 |
+
Hence, the case for mechanisms-based regulation — an approach that is focused on regulating very specific mechanisms that are strictly tied to meaningful AI risks. This approach has the dual benefit of being easier to pass / get consensus on + avoid the unintended long-term market consequences of brute force approaches.
|
76 |
+
|
77 |
+
The case for mechanisms-based regulation
|
78 |
+
In DOJ v. Google, we talked about how the DOJ is going after specific anti-competitive mechanisms that Google engaged in (specifically, Android deals where device manufactures had to agree to onerous terms to get access to essential Android services). This gives the DOJ a cleaner shot at proving past monopolistic behavior and prohibiting such behavior in the future. This is unlike some of FTC’s missteps where they have unsuccessfully tried a “everything big is bad” approach (eg. Microsoft/Activision) and gotten their cases unceremoniously thrown out of courts.
|
79 |
+
|
80 |
+
In a similar vein, to regulate AI, a focused approach that targets specific mechanisms is more likely to be successful. Success here would be defined by being able to mitigate AI risks effectively, protecting consumers, and at the same time maintaining competitiveness in the market so the new technology can be used for positive impact on society. Here is a non-exhaustive list of specific mechanisms that are worth targeting to alleviate AI risks:
|
81 |
+
|
82 |
+
Liability on model owners AND distributors: I disagree with both of OpenAI’s proposed solutions to mitigate harmful use cases — licensing regime and shared liability with users. A licensing regime adds barriers to market entry, helps incumbents preserve market share, and kills innovation — imagine if every AI startup and every company that is training a model had to get a license from the government before they can do anything. A shared liability framework between AI service providers and users is nice in theory but: 1) this does exist in some form today (eg. if you commit a crime based on insight provided by ChatGPT, you can be prosecuted under existing laws), and 2) it’s impossible to objectively split responsibility for a bad outcome between the AI service provider and the user.
|
83 |
+
|
84 |
+
A better approach is holding model owners AND distributors liable for harmful use of their products. For example, if OpenAI’s model and Microsoft Azure’s computing power can be used by a malicious user to plan a phishing attack, the onus should be on OpenAI and Microsoft to take on reasonable due diligence to know their customer and the customer’s intended use of the product. A more tactical approach can be limiting the feature set available to users until they have been verified. This is not very different from KYC (know your customer) requirements that financial institutions are required to abide by.
|
85 |
+
|
86 |
+
Codifying copyright for data used in model training, disclosing training data sets, and opt-outs for content owners: Data scraping is a major problem today for content owners. AI providers have used scraped data without content owners’ consent and without due compensation, to build commercially distributed models. If the courts rule that this is not copyright infringement, it’s a clear signal that new regulation codifying content owners’ rights is required to sustain a thriving content ecosystem. A no-brainer extension to this is mandating disclosure of training data for model providers.
|
87 |
+
|
88 |
+
Another related mechanism is to allow content owners to opt out of their data being used for model training, and do this without predatory “tie-ins”. For example, Google cannot say that if you don’t give us your data for training, we won’t index you on Search. Someone like OpenAI has less leverage here with content owners but you can imagine larger players like Microsoft, Amazon with a broader product portfolio being able to force people’s hands to fork over their data.
|
89 |
+
|
90 |
+
Full control over user data: A few specific mechanisms here can mitigate the user privacy risks created by AI. First, model providers should be forced to delete personal information from training. There needs to be some clear definition of what constitutes personal information (eg. information from a celebrity’s wikipedia page is not PI but emails and phone numbers from ZoomInfo’s database is). Second, companies should be prohibited from being able to tie-in consumer features to user’s willingness to fork over data for model training (eg. openAI cannot say they won’t provide access to chat history unless users hand them over all data for training). There is clear precedent here — Apple’s app tracking transparency framework (which I acknowledge is not regulation) prohibits apps from gating features behind a tracking opt-in wall, and EU’s advertising regulation prohibits platforms from being able to gate features behind opt-in for behavioral advertising.
|
91 |
+
|
92 |
+
Content watermarking / provenance: As AI-generated content explodes, both text as well as image / video, it becomes increasingly important to be able to distinguish AI-generated content particularly when it is false or misleading. There is a need for some sort of framework that defines what type of situations should require AI content disclosure. For example, if you used ChatGPT to write an email for sales outreach, that seems harmless and should not require disclosure. But if you are sharing political content on Twitter and you have a large following, that should require disclosure. Good regulation here would be less prescriptive of actual solutions and would lay out a framework for companies to work with, with the free market figuring out what the actual solutions are (eg. a startup could emerge to detect AI-generated political content on Twitter, which Twitter can then partner with).
|
93 |
+
|
94 |
+
Conclusion
|
95 |
+
Overall, I’m encouraged by the early conversations that are happening today around the topic, unlike technologies in the past where regulation has been an afterthought. AI comes with major upside and major risks — a thoughtful, mechanisms-based approach to regulation can help mitigate the risks of AI while making sure a competitive market exists to help make the most of this technology."
|
96 |
+
2,"So, Medium’s partner program is unavailable in my Country because Stripe is only available in 31 countries. Sadly, the Philippines is not on the list.
|
97 |
+
|
98 |
+
I was so excited to know about the update. There is no need to have 100 followers to apply to the Medium Partner Program by August! But I was wrong because it’s not for all.
|
99 |
+
|
100 |
+
It wasn’t very comforting, but I had nothing to do about it but continue writing my stories. It’s been over a week since I last worked on a topic.
|
101 |
+
|
102 |
+
I just had to decide whether to continue my membership or not.
|
103 |
+
|
104 |
+
I had to cancel my Medium membership.
|
105 |
+
I’m slightly disappointed, but I had to cancel my Medium membership for now.
|
106 |
+
|
107 |
+
Because I can only read a few stories in a day, I’m just busy right now trying anything. But it would have been a different story if I were part of the partner program. I may use the few cents to pay for the annual membership.
|
108 |
+
|
109 |
+
If not, I will have to explore and try more ways to get commissions while getting writing experience.
|
110 |
+
|
111 |
+
Trying anything to experience writing more.
|
112 |
+
Getting more writing experience means finding more ways and where to do it. Then right there, write!
|
113 |
+
|
114 |
+
I just built my first website, which you can visit, by the way, and tell me your feedback about it. I would love to hear it.
|
115 |
+
|
116 |
+
Then I am trying Fiverr right now, and I will see if I can get a client after publishing my first gig. It is a challenge and will help me in my writing journey, so I had to try this.
|
117 |
+
|
118 |
+
I will also send a pitch on my prospect guest posting websites. This will be a first-time experience because some will give me the credentials to publish a post on their websites.
|
119 |
+
|
120 |
+
For now, I want to broaden my writing environment and not stay where I am most comfortable. I will have to explore and continue to write to improve my writing skill and knowledge, so wish me the best. 😙"
|
121 |
+
3,"AI is very trending these days, especially Generative AI tools and Large Language Models like ChatGPT, Bard, mid-journey, and more. The foundation of many of these tools lies in Deep learning techniques.
|
122 |
+
|
123 |
+
To learn deep learning, you should do a lot of projects. Of course, if you have prior knowledge, you possibly did handwritten digit recognition or iris classification projects, but by doing exciting deep learning projects, not only you will hone your skills, you will have fun too!
|
124 |
+
|
125 |
+
In this article, we will go through different deep-learning project ideas to test your skills, and if you are a total beginner, it will be a great starting point. But first, let’s start with fundamentals.
|
126 |
+
|
127 |
+
What is Deep Learning?
|
128 |
+
Deep Learning teaches computers to think and learn as we do. Our brains recognize shapes, understand colors, and figure meaning together through layers. Deep Learning works similarly, but this time it will be using neural networks, mathematical equations, and computing power.
|
129 |
+
|
130 |
+
Say you’re teaching it to identify the cats. Feed it many cat photos and it will recognize whiskers, tails, and ears. More photos, better at recognizing cats. But not just cats, you can teach it anything!
|
131 |
+
|
132 |
+
Why It Matters?
|
133 |
+
Because you can do a wide range of real-life tasks like predicting the weather, analyzing clothing reviews, classifying news, recognizing yoga poses, identifying fruits or detecting masks during epidemics, and more. And guess what, together we will see them all in this article.
|
134 |
+
|
135 |
+
Deep Learning is a helpful, super smart friend that keeps learning and can be applied to any field including data. By using its methods, not only you will be able to do decision-making predictions, but by using it, you also can taste different professions.
|
136 |
+
|
137 |
+
Deep Learning Project Ideas
|
138 |
+
You can find deep learning project ideas, through Github, and Kaggle kind of websites.
|
139 |
+
|
140 |
+
But for the sake of this article, I will do this task for you. For this article, I share with you 8 deep learning project ideas. By examining them not only you will be curious about them, I hope you can gain practical experience, let’s start.
|
141 |
+
|
142 |
+
Deep Learning Project #1: Weather Forecasting
|
143 |
+
Deep Learning Project Ideas
|
144 |
+
Created with Leonardo.ai
|
145 |
+
Link to dataset: London weather dataset
|
146 |
+
|
147 |
+
Now, you are going to be a meteorologist and your task is to predict London’s weather.
|
148 |
+
|
149 |
+
Predicting it might be a challenging task, but not for you.
|
150 |
+
|
151 |
+
By using this London Weather data, now you will be able to use deep learning to forecast weather conditions and potentially tip agriculture, and tourism kind of sectors. To do this project, you should first explore the dataset. You can do this by using pandas and if you don’t know pandas that much yet, you can hone your skills by cracking real-life Python Pandas Interview questions here.
|
152 |
+
|
153 |
+
Now you know your data, manipulate it and transform it into the shape where you can build a neural net. Here you can build a recurrent neural network( RNN) on this historical data to predict future conditions. But don’t forget to validate your model’s result on a testing set to ensure its performance.
|
154 |
+
|
155 |
+
Finally, you can predict the weather for a specific period of time, and I guess if it is rainy, that would not be much of a surprise!
|
156 |
+
|
157 |
+
Deep Learning Project #2: Predicting Sentiment from Clothing Reviews
|
158 |
+
Deep Learning Project Ideas
|
159 |
+
Created with Leonardo.ai
|
160 |
+
Link to the dataset: Predicting Sentiment from Clothing Reviews
|
161 |
+
|
162 |
+
It is time to be Fashion Designer. Now the task is to read the clothing reviews. It might be your favorite or worst time due to the nature of the comments. You want to see them all, but there are thousands of comments made. Luckily, you have this dataset to predict sentiment from clothing reviews.
|
163 |
+
|
164 |
+
This is a text classification problem and can be solved by using Recurrent Neural Networks(RNNs) or maybe Transformer-based models like BERT. These are effective models for text classification and they can capture the sequential nature of the text.
|
165 |
+
|
166 |
+
First, as always, you should transform the text data, because generally, it will be in raw form. You have to make it ready to build Machine Learning models. Then you can build your model according to your taste and finally, it is time to evaluate your model on a separate test set.
|
167 |
+
|
168 |
+
You can try different flavors like hyperparameters to improve your model’s performance, but be sure you have enough computing power, otherwise, that process may last days.
|
169 |
+
|
170 |
+
Finally, you can analyze tens of thousands of reviews at a very limited time, thanks to the power of deep learning.
|
171 |
+
|
172 |
+
Deep Learning Project #3: Fake & Real News
|
173 |
+
Deep Learning Project Ideas
|
174 |
+
Created with Leonardo.ai
|
175 |
+
Link to the dataset: Fake and Real News Dataset
|
176 |
+
|
177 |
+
Sometimes, It is really hard to believe news naively. That’s why you have been assigned to this task. Now you are a journalist and your task is to classify the news as fake or real. Of course, you can classify 10–15 or let’s say 20 news in a day, but what if you would use deep learning?
|
178 |
+
|
179 |
+
Fake and Real News Dataset includes a collection of new articles and these articles were labeled as either real or fake. The real thing is what we want to achieve by building Deep Learning to imitate human cognition and possibly try to increase its speed and accuracy if we can.
|
180 |
+
|
181 |
+
Here we are going to build RNN again. Our model then could learn how to identify real and fake news, and we can test its accuracy to be sure. Always remember to separate your dataset into training and test, to be sure that your model can make generalization well to unseen data.
|
182 |
+
|
183 |
+
Deep Learning Project #4: Yoga Pose Classification
|
184 |
+
Deep Learning Project Ideas
|
185 |
+
Created with Leonardo.ai
|
186 |
+
Link to the dataset: Yoga Pose Classification
|
187 |
+
|
188 |
+
Now, let’s suppose you are a yogi and teaching it and your task is to classify yoga poses.
|
189 |
+
|
190 |
+
To do that, you first need to know which pictures symbolize yoga poses. This Yoga Pose Classification dataset includes a bunch of images of 5 main yoga poses, which are the downward dog pose, goddess pose, tree pose, plank pose, and warrior pose.
|
191 |
+
|
192 |
+
Of course, you possibly know which pose is which, yet you have to transform your knowledge to the computer to automate this classification process. To do that, CNN is a good fit, because this task includes an image classification task.
|
193 |
+
|
194 |
+
We can start this project by pre-processing again and we can use different methods. Let’s say we need to increase the diversity of our training data to increase efficiency, we should do data augmentation.
|
195 |
+
|
196 |
+
And finally, yoga poses can be classified automatically and you can do your asanas without being busy!
|
197 |
+
|
198 |
+
Deep Learning Project #5: Date Fruit Datasets — Image Classification
|
199 |
+
Deep Learning Project Ideas
|
200 |
+
Created with Leonardo.ai
|
201 |
+
Link to the dataset: Date Fruit Dataset
|
202 |
+
|
203 |
+
So let’s imagine being a farmer and your task is to classify different types of fruits. It is a sunny day and you are in the middle of your garden, looking at different fruits. But how can you automate this process?
|
204 |
+
|
205 |
+
In this Date Fruit Dataset, there are images of seven different classes of data fruits exist : “: Barhee, Deglet Nour, Sukkary, Rotab Mozafati, Ruthana, Safawi, and Sagai, including their features like shape and colors.
|
206 |
+
|
207 |
+
By using this information and a little bit of Convolutional Neural Networks, you can guess what comes next in the next season, by just taking pictures of them and feeding our algorithm with that information.
|
208 |
+
|
209 |
+
Also, this approach can be used in the agricultural industry to increase efficiency and reduce manual labor. After developing this algorithm, by feeding it to the images and information of the local fruits, maybe you can earn some money too, by doing a little bit of marketing to the local firms.
|
210 |
+
|
211 |
+
Deep Learning Project #6: Face Mask Detection
|
212 |
+
Deep Learning Project Ideas
|
213 |
+
Created with Leonardo.ai
|
214 |
+
Link to the dataset: Face Mask Detection
|
215 |
+
|
216 |
+
Now you are a doctor, and working in the hospital, during an epidemic, like Covid19. That’s why, now wearing a mask is mandatory to reduce an infection. So, it is time to detect masks from your patients.
|
217 |
+
|
218 |
+
But unfortunately, you should do this manually, because sometimes your patients might forget wearing these masks. Would not it be if there are a camera exist, near your door, this camera would detect face masks from your patients. Let them in if they wearing a mask, if not, they should wait longer.
|
219 |
+
|
220 |
+
But do that, first you need to develop a deep learning model that can detect whether a person is wearing a face mask or not This is a binary image classification problem that can be solved by using Convolutional Neural Networks.
|
221 |
+
|
222 |
+
In the following project, you can develop a face mask detection system using PyTorch and the Faster R-CNN model. You’ll preprocess data, create a Dataset and DataLoader, and modify a pre-trained Faster R-CNN model for the task. After training and evaluating the model, you’ll visualize its predictions.
|
223 |
+
|
224 |
+
Finally, you’ll save the trained model for future use, maybe put that into the camera’s algorithm and set the system, which automatically controls the patients whether they wearing masks or not.
|
225 |
+
|
226 |
+
Deep Learning Project #7: Celebrity Face Detection
|
227 |
+
Deep Learning Project Ideas
|
228 |
+
Created with Leonardo.ai
|
229 |
+
Link to the Dataset: CelebFaces Attributes (CelebA) Dataset
|
230 |
+
|
231 |
+
Now let’s say you are a reporter and your task is to keep track of celebrities. So you should name them by just looking at them, right? If you are having trouble remembering one or two names, you can easily use this dataset.
|
232 |
+
|
233 |
+
You can use the CelebFaces Attributes (CelebA) Dataset, which has over 200K+ celebrity images. Your project would involve training a deep learning model, likely a Convolutional Neural Network (CNN), to identify various facial attributes.
|
234 |
+
|
235 |
+
To improve your model’s performance, consider strategies like data augmentation, hyperparameter tuning, and regularization. You might also implement early stopping during training to prevent overfitting or use transfer learning by taking advantage of pre-trained models like VGG16 or ResNet.
|
236 |
+
|
237 |
+
Of course, after all these, you can use your model to predict the celebrities’ names, by just uploading their pictures and asking the model to classify them.
|
238 |
+
|
239 |
+
Deep Learning Project #8: Mental Health FAQ Chatbot
|
240 |
+
Deep Learning Project Ideas
|
241 |
+
Created with Leonardo.ai
|
242 |
+
Link to the dataset: Mental Health FAQ
|
243 |
+
|
244 |
+
Imagine if Sigmund Freud were born in this era. I guess he would have tried to develop a chatbot, for their patients to help them.
|
245 |
+
|
246 |
+
The Mental Health FAQ, a collection of frequently asked questions related to mental health, could be a great resource for this. Sigmund Freud’s task would be to train a deep learning model, such as a sequence-to-sequence model (a type of Recurrent Neural Network), to generate responses to mental health-related questions.
|
247 |
+
|
248 |
+
To enhance your model’s performance, consider strategies like using attention mechanisms to better capture context, or experimenting with different types of RNNs like LSTM or GRU. Finally, this would help Dr. Sigmund Freud to serve their patients for unlimited time but of course with limited knowledge.
|
249 |
+
|
250 |
+
What is the difference between Machine Learning and Deep Learning?
|
251 |
+
In the beginning, you might have a hard time distinguishing between Machine Learning and Deep Learning. Basically, the algorithms that have been used, and their functionalities are different. I explained them all here in “ Data Science vs Machine Learning vs Deep Learning “ if you want to dig deeper.
|
252 |
+
|
253 |
+
In summary, machine learning is the cornerstone of deep learning, so before learning deep learning, having machine learning prior knowledge would have been much better for you.
|
254 |
+
|
255 |
+
Take your time and start with an easy machine learning project, if you don’t have much experience in Deep Learning, then you can climb the stairs through deep learning. Here I introduced a mix of Machine Learning and Deep Learning Projects.
|
256 |
+
|
257 |
+
Final Thoughts
|
258 |
+
It is actually funny, right? If you come closer from the right angle with curiosity and a good amount of eagerness towards anything, it won’t take much time to learn or effort.
|
259 |
+
|
260 |
+
In this article, we will go through a bunch of exciting deep learning project ideas, along with datasets and a little bit of technical explanation, starting with weather forecasting to Freud’s mental heal chatbot.
|
261 |
+
|
262 |
+
If you have knowledge about Deep Learning, I suggest you take a look at these projects a little bit deeper. Also, if you are a beginner, there is no better time than today to start and here are 19 Data Science Project Ideas for beginners if you want to see more.
|
263 |
+
|
264 |
+
Thanks for reading!"
|
blog_metadata.csv
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
blog_id,blog_title,blog_subtitle,blog_category,blog_keyword1,blog_keyword2,blog_keyword3,blog_keyword4,blog_keyword5
|
2 |
+
1,Regulating AI: The Case for a Mechanisms-Based Approach,"Targeting specific mechanisms mitigates AI risks more effectively, is easier to get consensus on, and avoids unintended consequences of brute force approaches",Artificial Intelligence,Artificial Intelligence,Data Science,Deep Dives,Ai Regulation,Ai Risk
|
3 |
+
2,Medium Partner Program is not Available in my Country,I had to cancel my membership on Medium.,Medium Partner Program,Medium Partner Program,Medium Membership,Stripe,,
|
4 |
+
3,8 Exciting Deep Learning Project Ideas,Exploring AI: Diverse Deep Learning Project Ideas for Data Enthusiasts,Deep Learning,Data Science,Data Science Projects,Projects,AI,
|