Complete tutorials and reference notes for data processing, machine learning algorithms, OpenRefine, Flask development, Regular Expressions, and advanced ML concepts.
Master data cleaning, transformation, and quality improvement with OpenRefine
Download OpenRefine 3.6.2 from: https://github.com/OpenRefine/OpenRefine/releases/tag/3.6.2
Choose your OS version (Windows, Mac, Linux). Version 3.6.2 includes embedded Java.
1. Extract the downloaded file
2. Run the executable (openrefine.exe on Windows)
3. Open browser and go to: http://127.0.0.1:3333
1. Click "Create Project"
2. Choose "This Computer" → Browse for your CSV file
3. Click "Next" to preview data
4. Verify column headers and data types
5. Click "Create Project" (top right)
• Always preview your data before creating the project
• Check if headers are properly detected
• Verify column separation (comma, tab, semicolon)
• Note any encoding issues with special characters
1. Click on column dropdown → Facet → Text facet
2. In the facet panel (left side), you'll see all unique values
3. Click on (blank) to select only blank rows
4. Click All → Edit rows → Remove all matching rows
5. Close the facet when done
1. Click column dropdown → Text filter
2. Enter your search term (e.g., "Airport")
3. Note: Filtering is case-sensitive by default
4. To keep only matching rows: All → Edit rows → Remove all non-matching rows
1. Click source column dropdown → Edit column → Add column based on this column
2. Enter new column name
3. Write GREL expression in the expression box
4. Preview shows first few results
5. Click OK to create column
Function | Purpose | Example |
---|---|---|
contains(value, "text") |
Check if value contains text | contains(value, "Airport") |
value.match(/pattern/) |
Extract text matching regex | value.match(/([A-Za-z\s]+Airport)/) |
split(value, delimiter) |
Split text into array | split(value, ",") |
trim(value) |
Remove leading/trailing spaces | trim(value) |
if(condition, true_value, false_value) |
Conditional logic | if(contains(value, "Airport"), "Yes", "No") |
if(contains(value, "Airport"), value.match(/([A-Za-z\s]+Airport)/)[0].trim(), "" )
value.trim().replace(/\s+/, " ").toTitlecase()
if(contains(value, " at "), split(value, " at ")[1].trim(), value )
1. Select the column with similar values
2. Go to Edit cells → Cluster and edit
3. Try different clustering methods:
• Key collision → fingerprint: Basic similarity
• Key collision → ngram-fingerprint: Handles typos
• Nearest neighbor → levenshtein: Character differences
4. Review suggested clusters and choose merge values
5. Check Merge? for clusters you want to merge
6. Click Merge Selected & Re-Cluster
1. Click Export (top right) → Comma-separated value
2. Save the cleaned CSV file
1. Go to Undo/Redo tab → Extract → Export
2. Downloads history.json
with all operations
3. Can be used to replay operations on similar datasets
• Always backup original data before major operations
• Test GREL expressions on small datasets first
• Check for case sensitivity in filters and matching
• Verify row counts after filtering operations
Build data-driven web applications with Python Flask
# Install Flask pip install Flask # Optional: Create virtual environment first python -m venv flask_env # Windows: flask_env\Scripts\activate # Mac/Linux: source flask_env/bin/activate pip install Flask
project_folder/ ├── run.py # Main application runner ├── flaskapp/ │ ├── __init__.py # Flask app initialization │ ├── routes.py # URL routes and view functions │ └── templates/ │ └── index.html # HTML templates └── data/ └── dataset.csv # Data files
""" run.py - Run the Flask app """ from flaskapp import app if __name__ == '__main__': # Start the Flask development server app.run(host='127.0.0.1', port=3001, debug=True)
from flask import Flask # Create Flask application instance app = Flask(__name__) # Import routes (must be after app creation) from flaskapp import routes
import csv def username(): return 'your_username' def data_wrangling(filter_class=None): """ Process CSV data and return formatted results Args: filter_class (str): Optional filter for data category Returns: tuple: (header, table_data, dropdown_options) """ with open('data/dataset.csv', 'r', encoding='utf-8') as f: reader = csv.reader(f) table = [] all_classes = set() # Read header row header = next(reader) # Read and process data rows for row in reader: # Convert numeric columns processed_row = [ row[0], # species (string) row[1], # class (string) int(row[2]) if row[2] else 0 # count (integer) ] table.append(processed_row) all_classes.add(row[1]) # Collect unique classes # Create sorted dropdown options dropdown_options = sorted(list(all_classes)) # Apply filtering if specified if filter_class: table = [row for row in table if row[1] == filter_class] # Sort by count (descending) and limit to top 10 table.sort(key=lambda x: x[2], reverse=True) table = table[:10] return header, table, dropdown_options
from flask import render_template, request from flaskapp import app from data_processing import data_wrangling @app.route('/') @app.route('/index') def index(): """Main page route""" # Get filter parameter from URL filter_class = request.args.get('filter_class') # Process data with optional filter header, table, dropdown_options = data_wrangling(filter_class) # Render template with data return render_template('index.html', header=header, table=table, dropdown_options=dropdown_options, current_filter=filter_class) @app.route('/api/data') def api_data(): """API endpoint for JSON data""" filter_class = request.args.get('filter_class') header, table, dropdown_options = data_wrangling(filter_class) return { 'header': header, 'data': table, 'filters': dropdown_options, 'count': len(table) }
# Navigate to your project folder cd your_project_folder # Run the application python run.py # Output should show: # * Running on http://127.0.0.1:3001
Open your browser and go to: http://127.0.0.1:3001
• Use debug=True
for development (auto-reload on changes)
• Organize code into modules (routes, models, utilities)
• Use templates for all HTML (avoid HTML in Python code)
• Handle errors gracefully with try-catch blocks
• Use environment variables for configuration
Master pattern matching for text processing and data extraction
Pattern | Description | Example | Matches |
---|---|---|---|
. |
Any single character | a.c |
abc, axc, a1c |
* |
Zero or more of preceding | ab*c |
ac, abc, abbc |
+ |
One or more of preceding | ab+c |
abc, abbc (not ac) |
? |
Zero or one of preceding | ab?c |
ac, abc |
^ |
Start of string | ^Hello |
Hello world |
$ |
End of string | world$ |
Hello world |
# Python import re # Basic airport pattern pattern = r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)\s+((?:International\s+)?Airport)\b' text = "77kg of sandalwood at the Chhatrapati Shivaji Maharaj International Airport" match = re.search(pattern, text) if match: airport_name = f"{match.group(1)} {match.group(2)}" print(airport_name) # "Chhatrapati Shivaji Maharaj International Airport"
# Basic email pattern pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' text = "Contact us at support@example.com or admin@site.org" emails = re.findall(pattern, text) print(emails) # ['support@example.com', 'admin@site.org']
Function | Purpose | Returns |
---|---|---|
re.search(pattern, text) |
Find first match | Match object or None |
re.findall(pattern, text) |
Find all matches | List of strings |
re.sub(pattern, replacement, text) |
Replace matches | Modified string |
Essential Python techniques for data manipulation and analysis
import pandas as pd import numpy as np # Load CSV data df = pd.read_csv('data.csv') # Basic data exploration print(f"Shape: {df.shape}") print(f"Columns: {df.columns.tolist()}") print(f"Data types:\n{df.dtypes}") print(f"Missing values:\n{df.isnull().sum()}") # Display first few rows print(df.head()) # Basic statistics print(df.describe())
# Remove duplicate rows df_clean = df.drop_duplicates() # Remove rows with missing values in specific columns df_clean = df.dropna(subset=['important_column']) # Fill missing values df['column'] = df['column'].fillna('default_value') df['numeric_column'] = df['numeric_column'].fillna(df['numeric_column'].mean()) # Remove rows where column is empty string df_clean = df[df['column'].str.strip() != '']
# Basic filtering airport_data = df[df['Subject'].str.contains('Airport', case=True, na=False)] # Multiple conditions filtered = df[ (df['Count'] > 10) & (df['Country'].isin(['China', 'India'])) & (df['Date'].str.contains('2024')) ] # Using query method (more readable for complex conditions) result = df.query("Count > 10 and Country in ['China', 'India']") # String operations df['clean_name'] = df['name'].str.strip().str.title() has_keyword = df[df['description'].str.contains('wildlife', case=False, na=False)]
Understanding parallel vs sequential ensemble learning approaches
# Required packages import numpy as np from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import cross_val_score
# Random Forest (Bagging) rf = RandomForestClassifier( n_estimators=100, max_features='sqrt', # Random feature selection bootstrap=True, # Bootstrap sampling oob_score=True # Out-of-bag error estimation ) # AdaBoost (Boosting) ada = AdaBoostClassifier( base_estimator=DecisionTreeClassifier(max_depth=1), n_estimators=50, learning_rate=1.0 )
• Use √p features for classification, p/3 for regression
• More trees rarely hurt (diminishing returns after ~100)
• Use OOB error for model selection
• Feature importance provides good interpretability
• Start with weak learners (shallow trees)
• Lower learning rate with more estimators
• Monitor training/validation error to avoid overfitting
• More sensitive to noise than bagging
• Memory issues: Reduce n_estimators or use n_jobs=1
• Slow training: Use n_jobs=-1 for parallel processing
• Overfitting in boosting: Reduce learning rate or max_depth
• Poor Random Forest performance: Check max_features and min_samples_leaf
Classification and Regression Trees for interpretable machine learning
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor, plot_tree import matplotlib.pyplot as plt
# Classification Tree clf = DecisionTreeClassifier( criterion='gini', # or 'entropy' max_depth=10, # Limit tree depth min_samples_split=20, # Min samples to split min_samples_leaf=10, # Min samples in leaf random_state=42 ) # Visualization plt.figure(figsize=(20, 10)) plot_tree(clf, feature_names=feature_names, class_names=class_names, filled=True, rounded=True, max_depth=3) # Show only first 3 levels
• max_depth: Start with 5-10, tune based on validation
• min_samples_split: 2-20, higher for noisy data
• min_samples_leaf: 1-10, higher prevents overfitting
• max_features: sqrt(n) for classification, n/3 for regression
• Overfitting: Reduce max_depth, increase min_samples_leaf
• Underfitting: Increase max_depth, reduce stopping criteria
• Unbalanced trees: Check data distribution, consider class weights
• Large trees hard to visualize: Use max_depth=3 in plot_tree
Sparse signal recovery from underdetermined systems
import numpy as np from sklearn.linear_model import Lasso, Ridge, LassoCV, RidgeCV import scipy.io import matplotlib.pyplot as plt
# Generate sparse signal and measurements n_measurements, n_pixels = 1300, 2500 A = np.random.normal(0, 1, (n_measurements, n_pixels)) noise = np.random.normal(0, 5, n_measurements) y = A @ x_true + noise # Lasso with cross-validation lasso_cv = LassoCV(alphas=np.logspace(-4, 1, 50), cv=10) lasso_cv.fit(A, y) x_lasso = lasso_cv.coef_ # Ridge with cross-validation ridge_cv = RidgeCV(alphas=np.logspace(-2, 3, 50), cv=10) ridge_cv.fit(A, y) x_ridge = ridge_cv.coef_
• Measurement matrix: Use random Gaussian or Fourier measurements
• Lambda selection: Use cross-validation or information criteria
• Sparsity level: Ensure sufficient measurements (m ≥ 2s log p)
• Noise handling: Account for noise level in regularization
• Poor recovery: Check sparsity level and measurement count
• Over-regularization: Lambda too large, reduces to zero
• Under-regularization: Lambda too small, doesn't promote sparsity
• Numerical issues: Scale features, add small regularization
Adaptive boosting for sequential weak learner combination
import numpy as np import matplotlib.pyplot as plt from sklearn.tree import DecisionTreeClassifier
# Manual AdaBoost implementation (key parts) def find_best_stump(X, y, weights): best_error = float('inf') best_stump = None for feature in [0, 1]: # For each feature for threshold in thresholds: for polarity in [1, -1]: predictions = np.where(X[:, feature] <= threshold, polarity, -polarity) error = np.sum(weights[predictions != y]) if error < best_error: best_error = error best_stump = {'feature': feature, 'threshold': threshold, 'polarity': polarity, 'predictions': predictions} return best_stump, best_error # Weight updates alpha = 0.5 * np.log((1 - error) / error) Z = np.sum(D * np.exp(-alpha * y * predictions)) D_new = D * np.exp(-alpha * y * predictions) / Z
• Weak learners: Use shallow trees (stumps or max_depth=1-3)
• Number of iterations: Monitor training error, stop when it plateaus
• Data quality: AdaBoost is sensitive to noise and outliers
• Class balance: Works best with balanced datasets
• Overfitting: Reduce number of estimators or use early stopping
• Weak learners too strong: Reduce tree depth
• Numerical instability: Check for very small errors (ε ≈ 0)
• Poor performance: Ensure weak learners are actually weak but better than random
Ensemble of decision trees with bootstrap aggregating
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor from sklearn.model_selection import cross_val_score import numpy as np
# Random Forest with parameter tuning rf = RandomForestClassifier( n_estimators=100, max_features='sqrt', # √p for classification max_depth=10, min_samples_split=10, min_samples_leaf=5, oob_score=True, # Out-of-bag error random_state=42, n_jobs=-1 # Parallel processing ) # Parameter sensitivity analysis max_features_range = [1, 2, 5, 10, int(np.sqrt(n_features)), n_features//3] oob_errors = [] test_errors = [] for max_feat in max_features_range: rf_temp = RandomForestClassifier(max_features=max_feat, oob_score=True) rf_temp.fit(X_train, y_train) oob_errors.append(1 - rf_temp.oob_score_) test_errors.append(1 - rf_temp.score(X_test, y_test))
• max_features: √p for classification, p/3 for regression
• n_estimators: Start with 100, increase until OOB error stabilizes
• Tree depth: Don't limit unless overfitting (Random Forest handles this well)
• Feature importance: Use for feature selection and interpretation
• High bias: Increase n_estimators, reduce min_samples_leaf
• High variance: Increase min_samples_split, limit max_depth
• Slow training: Use n_jobs=-1, reduce n_estimators
• Memory issues: Reduce max_depth or n_estimators
Novelty detection and anomaly identification
from sklearn.svm import OneClassSVM from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score, classification_report
# One-Class SVM setup # Extract only normal examples for training X_normal = X_train[y_train == 0] # Only non-spam emails # Feature scaling (important for SVM) scaler = StandardScaler() X_normal_scaled = scaler.fit_transform(X_normal) X_test_scaled = scaler.transform(X_test) # Hyperparameter tuning nu_values = [0.01, 0.05, 0.1, 0.2, 0.3, 0.5] gamma_values = ['scale', 'auto', 0.001, 0.01, 0.1, 1.0] best_params = {} best_accuracy = 0 for nu in nu_values: for gamma in gamma_values: ocsvm = OneClassSVM(nu=nu, gamma=gamma, kernel='rbf') ocsvm.fit(X_normal_scaled) predictions = ocsvm.predict(X_test_scaled) # Convert: +1 → 0 (normal), -1 → 1 (anomaly) y_pred = np.where(predictions == 1, 0, 1) accuracy = accuracy_score(y_test, y_pred) if accuracy > best_accuracy: best_accuracy = accuracy best_params = {'nu': nu, 'gamma': gamma}
• Feature scaling: Always scale features for SVM
• Nu parameter: Start with expected outlier fraction
• Gamma tuning: Use grid search with cross-validation
• Evaluation: Focus on recall for anomaly class
• Poor anomaly detection: Adjust nu parameter, try different gamma
• Too many false positives: Decrease nu value
• Too many false negatives: Increase nu value
• Training instability: Ensure sufficient normal examples
Non-parametric regression with local model fitting
import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import KFold
def gaussian_kernel(z, h): """Gaussian kernel function""" return np.exp(-np.sum(z**2, axis=-1) / (2 * h**2)) def locally_weighted_regression(X_train, y_train, x_pred, h): """LWLR at single prediction point""" n, p = X_train.shape # Compute weights weights = gaussian_kernel(X_train - x_pred, h) # Design matrix: [1, (X_train - x_pred)] X_design = np.column_stack([np.ones(n), X_train - x_pred]) # Weighted least squares W = np.diag(weights) XTW = X_design.T @ W beta = np.linalg.solve(XTW @ X_design, XTW @ y_train) # Prediction is β₀ return beta[0] # Cross-validation for bandwidth selection def cross_validate_lwlr(X, y, h_values, cv_folds=5): kf = KFold(n_splits=cv_folds, shuffle=True, random_state=42) cv_errors = [] for h in h_values: fold_errors = [] for train_idx, val_idx in kf.split(X): X_train, X_val = X[train_idx], X[val_idx] y_train, y_val = y[train_idx], y[val_idx] y_pred = [locally_weighted_regression(X_train, y_train, x, h) for x in X_val.flatten()] mse = np.mean((y_val - np.array(y_pred))**2) fold_errors.append(mse) cv_errors.append(np.mean(fold_errors)) return np.array(cv_errors)
• Bandwidth selection: Use cross-validation for optimal h
• Numerical stability: Add small regularization to XᵀWX
• Computational efficiency: Pre-compute distances when possible
• Kernel choice: Gaussian is most common, but others work
• Singular matrix: Add regularization or increase bandwidth
• Poor predictions: Check bandwidth selection and data quality
• Slow computation: Use approximate methods for large datasets
• Boundary effects: Consider different kernels or padding
Understanding the fundamental tradeoff in machine learning
import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import learning_curve, validation_curve
# Validation curve to visualize bias-variance tradeoff from sklearn.model_selection import validation_curve # Example: Decision tree depth param_range = np.arange(1, 11) train_scores, test_scores = validation_curve( DecisionTreeClassifier(random_state=42), X, y, param_name='max_depth', param_range=param_range, cv=5, scoring='accuracy' ) # Plot bias-variance tradeoff plt.figure(figsize=(10, 6)) plt.plot(param_range, np.mean(train_scores, axis=1), 'o-', label='Training Score') plt.plot(param_range, np.mean(test_scores, axis=1), 'o-', label='Validation Score') plt.xlabel('Max Depth (Model Complexity)') plt.ylabel('Accuracy') plt.title('Bias-Variance Tradeoff') plt.legend() plt.grid(True) # Bias-variance decomposition (conceptual) def bias_variance_decomposition(true_function, model_predictions): """ Conceptual bias-variance decomposition """ mean_prediction = np.mean(model_predictions, axis=0) bias_squared = np.mean((mean_prediction - true_function)**2) variance = np.mean(np.var(model_predictions, axis=0)) return bias_squared, variance
• Cross-validation: Use for model selection and hyperparameter tuning
• Regularization: Add penalty terms to control model complexity
• Ensemble methods: Combine multiple models to reduce variance
• Data size: More data generally reduces variance
• High Bias: Increase model complexity, add features, reduce regularization
• High Variance: Decrease complexity, add regularization, get more data
• Both High: Review model choice and data quality
• Validation: Use learning curves to diagnose the problem
Robust model evaluation and selection methods
from sklearn.model_selection import ( cross_val_score, KFold, StratifiedKFold, LeaveOneOut, cross_validate, GridSearchCV )
# Basic cross-validation from sklearn.model_selection import cross_val_score # 5-fold cross-validation scores = cross_val_score(model, X, y, cv=5, scoring='accuracy') print(f"CV Score: {scores.mean():.3f} (+/- {scores.std() * 2:.3f})") # Stratified K-Fold for classification skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) scores = cross_val_score(model, X, y, cv=skf, scoring='accuracy') # Manual cross-validation loop kf = KFold(n_splits=5, shuffle=True, random_state=42) cv_errors = [] for train_idx, val_idx in kf.split(X): X_train, X_val = X[train_idx], X[val_idx] y_train, y_val = y[train_idx], y[val_idx] model.fit(X_train, y_train) y_pred = model.predict(X_val) error = 1 - accuracy_score(y_val, y_pred) cv_errors.append(error) cv_error = np.mean(cv_errors) cv_std = np.std(cv_errors) # Grid search with cross-validation from sklearn.model_selection import GridSearchCV param_grid = { 'max_depth': [3, 5, 7, 10], 'min_samples_split': [2, 5, 10], 'min_samples_leaf': [1, 2, 4] } grid_search = GridSearchCV( DecisionTreeClassifier(random_state=42), param_grid, cv=5, scoring='accuracy', n_jobs=-1 ) grid_search.fit(X, y) print(f"Best parameters: {grid_search.best_params_}") print(f"Best CV score: {grid_search.best_score_:.3f}")
• Fold selection: 5-10 folds typically optimal (bias-variance tradeoff)
• Stratification: Use for classification to maintain class balance
• Shuffle data: Unless temporal dependencies exist
• Nested CV: Use for unbiased model comparison with hyperparameter tuning
• Data leakage: Ensure no information flows from test to train
• Small datasets: Use LOO or stratified sampling
• Imbalanced data: Use stratified CV and appropriate metrics
• Time series: Use time-based splits, not random
Solutions to frequent problems in data processing and machine learning
Problem: GREL syntax errors or unexpected results
Solutions:
• Test expressions on small data samples first
• Use parentheses to ensure proper order of operations
• Check for typos in function names (case-sensitive)
• Use value.toString()
if working with numbers
Problem: Lost rows during data processing
Solutions:
• Check for case sensitivity in filters
• Verify GREL expressions don't return empty strings
• Use facets to inspect data before filtering
• Export history to track which operation caused loss
Problem: Referencing non-existent column
# Check available columns print(df.columns.tolist()) # Safe column access if 'column_name' in df.columns: result = df['column_name'] else: print("Column not found") # Use .get() method for safe access value = df.get('column_name', default_value)
Problem: Pandas warning about chained assignments
# Avoid chained assignment # Bad: df[df['A'] > 5]['B'] = 'new_value' # Good: Use .loc df.loc[df['A'] > 5, 'B'] = 'new_value' # Or create explicit copy subset = df[df['A'] > 5].copy() subset['B'] = 'new_value'
Problem: High training accuracy, poor test performance
Solutions:
• Reduce model complexity (less depth, fewer features)
• Add regularization (L1/L2 penalties)
• Use cross-validation for hyperparameter tuning
• Collect more training data
• Use ensemble methods to reduce variance
Problem: High variance in CV scores
Solutions:
• Check data distribution and class balance
• Use stratified CV for classification
• Increase number of CV folds
• Check for data leakage
• Ensure proper preprocessing
Problem: Flask can't locate HTML templates
Solutions:
• Ensure templates are in templates/
folder
• Check file name spelling and case
• Verify Flask app initialization includes template folder
app = Flask(__name__, template_folder='templates')
• Use online tools: regex101.com, regexr.com
• Test patterns incrementally (build complexity gradually)
• Print intermediate results to debug extraction
import re def debug_regex(pattern, text): """Debug regex pattern with detailed output""" print(f"Pattern: {pattern}") print(f"Text: {text}") match = re.search(pattern, text) if match: print(f"Full match: '{match.group()}'") for i, group in enumerate(match.groups(), 1): print(f"Group {i}: '{group}'") else: print("No match found") # Test with findall for multiple matches all_matches = re.findall(pattern, text) print(f"All matches: {all_matches}") # Usage debug_regex(r'([A-Z][a-z]+)\s+(Airport)', "Beijing Capital Airport and Shanghai Pudong Airport")
Structured areas for expanding knowledge and skills
Add notes on: neural networks, TensorFlow/PyTorch, CNNs, RNNs, transformers, deep learning architectures
Add tutorials for: SVM kernels, clustering algorithms, dimensionality reduction, reinforcement learning
Add tutorials for: matplotlib, seaborn, plotly, dashboard creation, interactive charts
Add content on: feature selection, feature creation, text processing, time series features
Add content on: database integration, user authentication, API development, deployment strategies
Add learning notes for: JavaScript, React, HTML/CSS advanced techniques, responsive design
Add tutorials on: SQL queries, database normalization, performance optimization, NoSQL databases
Add guides for: Git version control, Docker containerization, CI/CD pipelines, testing frameworks
Add learning for: AWS, Azure, Google Cloud, serverless computing, cloud databases
1. Create a new section with appropriate ID
2. Include: Overview, Installation/Setup, Core Concepts, Code Examples, Best Practices
3. Add practical examples and real-world use cases
4. Include troubleshooting section for common issues
5. Update table of contents with new section