A spider (also called a web crawler or bot) is an automated program that browses the internet to index web pages. These programs are often used by search engines like Google, Bing, or Yahoo to discover and update content in their search index.
Starting Point: The spider begins with a list of URLs to crawl.
Analysis: It fetches the HTML code of a webpage and analyzes its content, links, and metadata.
Following Links: It follows the links found on the page to discover new pages.
Storage: The collected data is sent to the search engine’s database for indexing.
Repetition: The process is repeated regularly to keep the index up to date.
Search engine optimization (SEO)
Price comparison websites
Web archiving (e.g., Wayback Machine)
Automated content analysis for AI models
Some websites use a robots.txt file to specify which areas can or cannot be crawled by a spider.
A crawler (also known as a web crawler, spider, or bot) is an automated program that browses the internet and analyzes web pages. It follows links from page to page and collects information.
Search Engines (e.g., Google's Googlebot) – Index web pages so they appear in search engine results.
Price Comparison Websites – Scan online stores for the latest prices and products.
SEO Tools – Analyze websites for technical errors or optimization potential.
Data Analysis & Monitoring – Track website content for market research or competitor analysis.
Archiving – Save web pages for future reference (e.g., Internet Archive).
Starts with a list of URLs.
Fetches web pages and stores content (text, metadata, links).
Follows links on the page and repeats the process.
Saves or processes the collected data depending on its purpose.
Many websites use a robots.txt file to control which content crawlers can visit or ignore.
The Fetch API is a modern JavaScript interface for retrieving resources over the network, such as making HTTP requests to an API or loading data from a server. It largely replaces the older XMLHttpRequest
method and provides a simpler, more flexible, and more powerful way to handle network requests.
fetch('https://jsonplaceholder.typicode.com/posts/1')
.then(response => response.json()) // Convert response to JSON
.then(data => console.log(data)) // Log the data
.catch(error => console.error('Error:', error)); // Handle errors
Making a POST Request
fetch('https://jsonplaceholder.typicode.com/posts', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({ title: 'New Post', body: 'Post content', userId: 1 })
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));
✅ Simpler syntax compared to XMLHttpRequest
✅ Supports async/await
for better readability
✅ Flexible request and response handling
✅ Better error management using Promises
The Fetch API is now supported in all modern browsers and is an essential technique for web development.
A Single Page Application (SPA) is a web application that runs entirely within a single HTML page. Instead of reloading the entire page for each interaction, it dynamically updates the content using JavaScript, providing a smooth, app-like user experience.
✅ Faster interactions after the initial load
✅ Improved user experience (no full page reloads)
✅ Offline functionality possible via Service Workers
❌ Initial load time can be slow (large JavaScript bundle)
❌ SEO challenges (since content is often loaded dynamically)
❌ More complex implementation, especially for security and routing
Popular frameworks for SPAs include React, Angular, and Vue.js.
Puppet is an open-source configuration management tool used to automate IT infrastructure. It helps provision, configure, and manage servers and software automatically. Puppet is widely used in DevOps and cloud environments.
✅ Declarative Language: Infrastructure is described using a domain-specific language (DSL).
✅ Agent-Master Architecture: A central Puppet server distributes configurations to clients (agents).
✅ Idempotency: Changes are only applied if necessary.
✅ Cross-Platform Support: Works on Linux, Windows, macOS, and cloud environments.
✅ Modularity: Large community with many prebuilt modules.
A Puppet manifest (.pp
file) might look like this:
package { 'nginx':
ensure => installed,
}
service { 'nginx':
ensure => running,
enable => true,
require => Package['nginx'],
}
file { '/var/www/html/index.html':
ensure => file,
content => '<h1>Hello, Puppet!</h1>',
require => Service['nginx'],
}
🔹 This Puppet script ensures that Nginx is installed, running, enabled on startup, and serves a simple HTML page.
1️⃣ Write a manifest (.pp
files) defining the desired configurations.
2️⃣ Puppet Master sends configurations to Puppet Agents (servers/clients).
3️⃣ Puppet Agent checks system state and applies only necessary changes.
Puppet is widely used in large IT infrastructures to maintain consistency and efficiency.
CSS Media Queries are a technique in CSS that allows a webpage layout to adapt to different screen sizes, resolutions, and device types. They are a core feature of Responsive Web Design.
@media (condition) {
/* CSS rules that apply only under this condition */
}
1. Adjusting for different screen widths:
/* For screens with a maximum width of 600px (e.g., smartphones) */
@media (max-width: 600px) {
body {
background-color: lightblue;
}
}
2. Detecting landscape vs. portrait orientation:
@media (orientation: landscape) {
body {
background-color: lightgreen;
}
}
3. Styling for print output:
@media print {
body {
font-size: 12pt;
color: black;
background: none;
}
}
✅ Mobile-first design: Optimizing websites for small screens first and then expanding for larger screens.
✅ Dark mode: Adjusting styles based on user preference (prefers-color-scheme
).
✅ Retina displays: Using high-resolution images or specific styles for high pixel density screens (min-resolution: 2dppx
).
Responsive Design is a web design approach that allows a website to automatically adjust to different screen sizes and devices. This ensures a seamless user experience across desktops, tablets, and smartphones without needing separate versions of the site.
Responsive Design is achieved using the following techniques:
1. Flexible Layouts
2. Media Queries (CSS)
@media (max-width: 768px) {
body {
background-color: lightgray;
}
}
→ This changes the background color for screens smaller than 768px.
3. Flexible Images and Media
img {
max-width: 100%;
height: auto;
}
4. Mobile-First Approach
✅ Better user experience across all devices
✅ SEO advantages, as Google prioritizes mobile-friendly sites
✅ No need for separate mobile and desktop versions, reducing maintenance
✅ Higher conversion rates, since users can navigate the site easily
Responsive Design is now the standard in modern web development, ensuring optimal display and usability on all devices.
Directory Traversal (also known as Path Traversal) is a security vulnerability in web applications that allows an attacker to access files or directories outside the intended directory. The attacker manipulates file paths to navigate through the server’s filesystem.
A vulnerable web application often processes file paths directly from user input, such as an URL:
https://example.com/getFile?file=report.pdf
If the server does not properly validate the input, an attacker could modify it like this:
https://example.com/getFile?file=../../../../etc/passwd
Here, the attacker uses ../
(parent directory notation) to move up the directory structure and access system files like /etc/passwd
(on Linux).
OAuth (Open Authorization) is an open standard protocol for authorization that allows applications to access a user's resources without knowing their credentials (e.g., password). It is commonly used for Single Sign-On (SSO) and API access.
OAuth operates using tokens, which allow an application to access a user's data on their behalf. The typical flow is as follows:
GoJS is a JavaScript library for creating interactive diagrams and graphs in web applications. It is commonly used for flowcharts, network topologies, UML diagrams, BPMN models, and other visual representations of data.
GoJS is widely used in business applications to visualize complex processes or relationships. It is a paid library but offers a free evaluation version.
The official website is: https://gojs.net