Natural Language Processing - 2025s2
  • Home
  • Final Project
  • Text Classification
    • 00 - Review - Regular expressions
    • 01 - Practice - Sentiment Analysis with ANEW
    • 02 - Theory - Math of Logistic Regression
    • 02a - Theory - Supplementary Material
    • 03 - Case Study - Classification on IMDB
    • 04 - Practice - Cross-dataset Classification
    • 05 - Practice - Detecting Fake News
  • Language Models
    • 00 - Theory - Language Models
    • 00a - Solution for exercises in 00
    • 01 - Case Study - Language Models
    • 02 - Theory - From Sklearn to Pytorch
    • 03 - Theory - MLP, Residuals, Normalization
    • 05 - Practice - Tokenizers, Classification and Visualization
    • 06 - Theory - Self-Attention and Self-Supervised Training
    • 07 - Case Study - Pre-trained BERT
    • 08 - Practice - Fine-tuning BERT
  • Search
  • Previous
  • Next
  • Sklearn's code with a toy example
    • Our dataset
    • The Vectorizer
    • Logistic Regression

Sklearn's code with a toy example¶

This is a toy example with code working as supplementary material for the theory on logistic regression.

Our dataset¶

In [9]:
Copied!
x = ["a happy happy phrase",
     "another super happy phrase",
     "a sad phrase",
     "an unhappy phrase"]

y = [1, 1, 0, 0]
x = ["a happy happy phrase", "another super happy phrase", "a sad phrase", "an unhappy phrase"] y = [1, 1, 0, 0]

The Vectorizer¶

In [10]:
Copied!
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
import numpy as np 
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer import numpy as np
In [19]:
Copied!
countvectorizer = CountVectorizer(binary=True)
countvectorizer.fit(x)
x_vect = countvectorizer.transform(x)
print(x_vect.toarray())
countvectorizer = CountVectorizer(binary=True) countvectorizer.fit(x) x_vect = countvectorizer.transform(x) print(x_vect.toarray())
[[0 0 1 1 0 0 0]
 [0 1 1 1 0 1 0]
 [0 0 0 1 1 0 0]
 [1 0 0 1 0 0 1]]
In [18]:
Copied!
countvectorizer.vocabulary_
countvectorizer.vocabulary_
Out[18]:
{'happy': 2,
 'phrase': 3,
 'another': 1,
 'super': 5,
 'sad': 4,
 'an': 0,
 'unhappy': 6}
In [ ]:
Copied!

Logistic Regression¶

In [20]:
Copied!
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression()

clf.fit(x_vect, y)
from sklearn.linear_model import LogisticRegression clf = LogisticRegression() clf.fit(x_vect, y)
Out[20]:
LogisticRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
penalty  'l2'
dual  False
tol  0.0001
C  1.0
fit_intercept  True
intercept_scaling  1
class_weight  None
random_state  None
solver  'lbfgs'
max_iter  100
multi_class  'deprecated'
verbose  0
warm_start  False
n_jobs  None
l1_ratio  None
In [23]:
Copied!
clf.coef_
clf.coef_
Out[23]:
array([[-3.03694267e-01,  2.71750141e-01,  6.63081739e-01,
         1.46448395e-04, -3.59241023e-01,  2.71750141e-01,
        -3.03694267e-01]])
In [26]:
Copied!
y_pred = clf.predict(x_vect)
print(y_pred, y)
y_pred = clf.predict(x_vect) print(y_pred, y)
[1 1 0 0] [1, 1, 0, 0]
In [ ]:
Copied!
x_test = [
    "today I feel so happy",
    "joy dwells in my heart",
    "today I can only feel the darkness",
    "I feel a bit unhappy about menial things of life"
]

x_vect_test = countvectorizer.transform(x_test)
y_pred_test = clf.predict(x_vect_test)
print(y_pred_test)
x_test = [ "today I feel so happy", "joy dwells in my heart", "today I can only feel the darkness", "I feel a bit unhappy about menial things of life" ] x_vect_test = countvectorizer.transform(x_test) y_pred_test = clf.predict(x_vect_test) print(y_pred_test)
[1 0 0 0]
In [29]:
Copied!
clf.predict_proba(x_vect_test)
clf.predict_proba(x_vect_test)
Out[29]:
array([[0.39122204, 0.60877796],
       [0.55500236, 0.44499764],
       [0.55500236, 0.44499764],
       [0.62822222, 0.37177778]])
In [32]:
Copied!
z = clf.decision_function(x_vect_test)
print(z)
z = clf.decision_function(x_vect_test) print(z)
[ 0.44217834 -0.2209034  -0.2209034  -0.52459766]
In [33]:
Copied!
1/(1+np.exp(-z))
1/(1+np.exp(-z))
Out[33]:
array([0.60877796, 0.44499764, 0.44499764, 0.37177778])

Documentation built with MkDocs.

Search

From here you can search these documents. Enter your search terms below.

Keyboard Shortcuts

Keys Action
? Open this help
n Next page
p Previous page
s Search