fpdn / docs /gazette.md
fil's picture
fix link css 👹
9346170
|
raw
history blame
4.15 kB

Gazette

Explore 3 million newspapers by title. Type in words such as “jeune”, “révolution”, “république”, “matin”, “soir”, “humanité”, “nouvelle”, “moderne”, “femme”, “paysan”, “ouvrier”, “social”, “résistance” etc. to see different historical trends.

const search = view(
  Inputs.text({ type: "search", value: "gazette", submit: true })
);

${ Plot.plot({ x: { nice: true }, y: { label: Share of titles matching ${search}, tickFormat: "%", }, marks: [ Plot.ruleY([0, 0.01], {stroke: ["currentColor"]}), Plot.areaY(base, { x: "year", y: ({year, total}) => gazette.get(year) / total, fillOpacity: 0.2, curve: "step" }), Plot.lineY(base, { x: "year", y: ({year, total}) => gazette.get(year) / total, curve: "step" }), ], }) }

I called this page “Gazette” because I was surprised that most of the corpus in the earlier years had a title containing this word. The query uses a case-insensitive REGEXP_MATCHES operator to count occurrences; you can query for example “socialis[tm]e” to match both “socialiste” and “socialisme”.

const results = db.query(
  `SELECT year, COUNT() c
     FROM presse
    WHERE REGEXP_MATCHES(title, ?, 'i')
 GROUP BY year
`,
  [search]
);
import { DuckDBClient } from "npm:@observablehq/duckdb";
const db = DuckDBClient.of({ presse: FileAttachment("data/presse.parquet") });
// A Map for fast retrieval—precisely an InternMap, indexed by Date
const gazette = new d3.InternMap(results.map(({ year, c }) => [year, c]));
// The base denominator (count by year)
const base = db.query(
  `SELECT year
       , COUNT(*)::int total
    FROM presse
   WHERE year > '1000'
GROUP BY year
ORDER BY year
`
);