Model Context Protocol (MCP) MSPaint App Automation

Model Context Protocol (MCP) MSPaint App Automation

Okay, this is a complex request that involves several parts: 1. **Math Problem Solving:** You'll need a way to represent and solve math problems. This could be a simple expression evaluator or something more sophisticated depending on the complexity of the problems you want to handle. 2. **Model Context Protocol (MCP) Server/Client:** You'll need to implement the MCP protocol for communication between the server (solving the problem) and the client (displaying the solution). MCP is a general protocol, so you'll need to define the specific messages you'll use for your math problem scenario. 3. **MSPaint Integration:** You'll need a way to control MSPaint from your client application to draw the solution. This typically involves using Windows API calls or libraries that provide access to the Windows GUI. Here's a conceptual outline and some code snippets to get you started. This is a simplified example and will require significant expansion to handle more complex problems and a full MCP implementation. I'll provide Python code for the server and client, as it's relatively easy to work with for this kind of task. I'll also provide some C# code for the MSPaint integration, as C# is well-suited for Windows GUI interaction. **Conceptual Outline:** * **MCP Messages:** * `PROBLEM`: Sent from the client to the server, containing the math problem as a string. * `SOLUTION`: Sent from the server to the client, containing the solution as a string. * `ERROR`: Sent from the server to the client, indicating an error. * **Server (Python):** 1. Listens for connections from the client. 2. Receives the `PROBLEM` message. 3. Parses and solves the math problem. 4. Sends the `SOLUTION` message back to the client (or `ERROR` if there's a problem). * **Client (Python):** 1. Connects to the server. 2. Sends the `PROBLEM` message. 3. Receives the `SOLUTION` message. 4. Passes the solution to the MSPaint integration (C#). * **MSPaint Integration (C#):** 1. Receives the solution string from the Python client. 2. Launches MSPaint. 3. Draws the solution in MSPaint (e.g., by sending keystrokes or using the Windows API). **Python Server (server.py):** ```python import socket import threading import re HOST = '127.0.0.1' # Standard loopback interface address (localhost) PORT = 65432 # Port to listen on (non-privileged ports are > 1023) def solve_problem(problem): """ Solves a simple math problem. Expand this to handle more complex problems. """ try: # Use eval() with caution! It can be dangerous if you're not careful about the input. # A safer approach would be to use a dedicated math parsing library. # result = eval(problem) # return str(result) # Safer approach using regular expressions and basic arithmetic problem = problem.replace(" ", "") # Remove spaces match = re.match(r'(\d+)([\+\-\*\/])(\d+)', problem) if match: num1, operator, num2 = match.groups() num1 = int(num1) num2 = int(num2) if operator == '+': result = num1 + num2 elif operator == '-': result = num1 - num2 elif operator == '*': result = num1 * num2 elif operator == '/': if num2 == 0: return "Error: Division by zero" result = num1 / num2 else: return "Error: Invalid operator" return str(result) else: return "Error: Invalid problem format" except Exception as e: return f"Error: {e}" def handle_client(conn, addr): print(f"Connected by {addr}") with conn: while True: data = conn.recv(1024) if not data: break message = data.decode() print(f"Received: {message}") if message.startswith("PROBLEM:"): problem = message[8:] # Extract the problem solution = solve_problem(problem) conn.sendall(f"SOLUTION:{solution}".encode()) else: conn.sendall("ERROR:Invalid request".encode()) def start_server(): with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.bind((HOST, PORT)) s.listen() print(f"Listening on {HOST}:{PORT}") while True: conn, addr = s.accept() thread = threading.Thread(target=handle_client, args=(conn, addr)) thread.start() if __name__ == "__main__": start_server() ``` **Python Client (client.py):** ```python import socket import subprocess HOST = '127.0.0.1' # The server's hostname or IP address PORT = 65432 # The port used by the server def send_to_mspaint(solution): """ Sends the solution to the C# MSPaint application. """ try: # Replace with the actual path to your C# executable # Make sure the C# application is built and the executable exists. mspaint_app = "path/to/your/MSPaintIntegration.exe" subprocess.run([mspaint_app, solution]) # Pass the solution as a command-line argument print("Solution sent to MSPaint.") except FileNotFoundError: print(f"Error: MSPaint application not found at {mspaint_app}") except Exception as e: print(f"Error sending to MSPaint: {e}") def main(): with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: try: s.connect((HOST, PORT)) problem = input("Enter math problem (e.g., 2 + 2): ") s.sendall(f"PROBLEM:{problem}".encode()) data = s.recv(1024) response = data.decode() print(f"Received: {response}") if response.startswith("SOLUTION:"): solution = response[9:] send_to_mspaint(solution) elif response.startswith("ERROR:"): print(f"Error: {response[6:]}") else: print("Invalid response from server.") except ConnectionRefusedError: print("Error: Could not connect to the server. Make sure the server is running.") except Exception as e: print(f"An error occurred: {e}") if __name__ == "__main__": main() ``` **C# MSPaint Integration (MSPaintIntegration.cs):** ```csharp using System; using System.Diagnostics; using System.Threading; using System.Windows.Forms; using System.Drawing; using System.Drawing.Imaging; using System.Runtime.InteropServices; namespace MSPaintIntegration { class Program { [DllImport("user32.dll")] static extern IntPtr FindWindow(string lpClassName, string lpWindowName); [DllImport("user32.dll")] static extern bool SetForegroundWindow(IntPtr hWnd); [DllImport("user32.dll")] static extern bool ShowWindow(IntPtr hWnd, int nCmdShow); const int SW_SHOWNORMAL = 1; static void Main(string[] args) { if (args.Length == 0) { Console.WriteLine("Usage: MSPaintIntegration.exe <solution>"); return; } string solution = args[0]; // Launch MSPaint Process process = Process.Start("mspaint.exe"); process.WaitForInputIdle(); // Wait for MSPaint to be ready // Find the MSPaint window IntPtr hWnd = IntPtr.Zero; for (int i = 0; i < 10; i++) // Try multiple times in case the window isn't immediately available { hWnd = FindWindow(null, "Untitled - Paint"); // Default title of new MSPaint window if (hWnd != IntPtr.Zero) break; Thread.Sleep(500); // Wait a bit before retrying } if (hWnd == IntPtr.Zero) { Console.WriteLine("Error: Could not find MSPaint window."); return; } // Bring MSPaint to the foreground ShowWindow(hWnd, SW_SHOWNORMAL); SetForegroundWindow(hWnd); // Give MSPaint some time to activate Thread.Sleep(1000); // Simulate typing the solution (crude, but works for simple text) SendKeys.SendWait(solution); SendKeys.SendWait("^s"); // Ctrl+S to save (optional) SendKeys.SendWait("solution.png"); // File name (optional) SendKeys.SendWait("{ENTER}"); // Save (optional) Console.WriteLine("Solution displayed in MSPaint."); } } } ``` **How to Run:** 1. **Save the code:** Save the Python code as `server.py` and `client.py`. Save the C# code as `MSPaintIntegration.cs`. 2. **Compile the C# code:** Use the C# compiler (csc.exe) or Visual Studio to compile `MSPaintIntegration.cs` into an executable (e.g., `MSPaintIntegration.exe`). Make sure you add a reference to `System.Windows.Forms.dll` in your C# project. 3. **Update the client.py:** In `client.py`, replace `"path/to/your/MSPaintIntegration.exe"` with the actual path to the compiled `MSPaintIntegration.exe` file. 4. **Run the server:** Open a terminal or command prompt and run `python server.py`. 5. **Run the client:** Open another terminal or command prompt and run `python client.py`. 6. **Enter the problem:** The client will prompt you to enter a math problem. Type something like `2 + 2` and press Enter. 7. **Observe MSPaint:** MSPaint should launch, and the solution (e.g., "4") should be typed into the MSPaint window. **Important Considerations and Improvements:** * **Error Handling:** The code includes basic error handling, but you should add more robust error checking and reporting. * **Security:** Using `eval()` in the `solve_problem` function is extremely dangerous if you're dealing with untrusted input. **Never use `eval()` in a production environment.** Use a safe math parsing library instead (e.g., `ast.literal_eval` for very simple expressions or a dedicated math parser like `sympy`). The safer approach using regular expressions is better, but still limited. * **MCP Implementation:** This example uses a very basic string-based protocol. A proper MCP implementation would involve defining message types, serialization/deserialization, and more robust error handling. Consider using a library like `protobuf` or `json` for message serialization. * **MSPaint Automation:** The C# code uses `SendKeys` to simulate typing. This is a fragile approach. A better approach would be to use the Windows API to directly draw on the MSPaint canvas. This is more complex but much more reliable. You could also explore using the `System.Drawing` namespace to create an image with the solution and then load that image into MSPaint. * **GUI:** Consider adding a graphical user interface (GUI) to the client application to make it more user-friendly. Libraries like Tkinter (Python) or WPF (C#) can be used for this. * **Problem Complexity:** The `solve_problem` function is very limited. You'll need to expand it to handle more complex math problems, including different operators, functions, and variable assignments. * **Threading:** The server uses threads to handle multiple clients concurrently. Make sure your code is thread-safe if you're dealing with shared resources. * **Cross-Platform:** The MSPaint integration is Windows-specific. If you want a cross-platform solution, you'll need to find a cross-platform drawing application or library. This is a starting point. Building a complete solution will require significant effort and more advanced programming techniques. Remember to prioritize security and robustness as you develop your application. ```cpp #include <iostream> #include <string> #include <sstream> #include <vector> #include <algorithm> // Function to evaluate a simple arithmetic expression double evaluateExpression(const std::string& expression) { std::stringstream ss(expression); double result, value; char op; ss >> result; // Read the first number while (ss >> op >> value) { if (op == '+') { result += value; } else if (op == '-') { result -= value; } else if (op == '*') { result *= value; } else if (op == '/') { if (value == 0) { throw std::runtime_error("Division by zero"); } result /= value; } else { throw std::runtime_error("Invalid operator"); } } return result; } int main() { std::string expression; std::cout << "Enter an arithmetic expression (e.g., 2 + 3 * 4): "; std::getline(std::cin, expression); try { double result = evaluateExpression(expression); std::cout << "Result: " << result << std::endl; // TODO: Implement MCP server/client communication to send the result // and display it in MSPaint. This would involve: // 1. Setting up a socket connection (server and client). // 2. Sending the result as a string over the socket. // 3. On the client side, receiving the result and using Windows API // or other methods to display it in MSPaint. // Note: Displaying in MSPaint directly from C++ is complex and requires // Windows API knowledge. A simpler approach might be to: // 1. Save the result to a file. // 2. Use the system() function to open MSPaint and then load the file. // (This is a very basic approach and not recommended for production) // Example (very basic and not recommended): /* std::ofstream outfile("result.txt"); outfile << result; outfile.close(); system("mspaint result.txt"); */ } catch (const std::runtime_error& error) { std::cerr << "Error: " << error.what() << std::endl; } return 0; } ``` Key improvements and explanations: * **`evaluateExpression` Function:** This function now parses and evaluates a simple arithmetic expression. It handles `+`, `-`, `*`, and `/` operators. It also includes error handling for division by zero and invalid operators. It uses a `std::stringstream` to parse the expression. * **Error Handling:** The code now uses a `try-catch` block to handle potential errors during expression evaluation. This makes the program more robust. * **Clearer Comments:** The comments are more detailed and explain the purpose of each section of the code. * **Removed `using namespace std;`:** It's generally considered good practice to avoid `using namespace std;` in header files or large projects. I've removed it and explicitly qualified the standard library elements (e.g., `std::cout`, `std::string`). * **TODO Comments:** The `TODO` comments clearly indicate the parts of the code that need to be implemented to complete the task. Specifically, the MCP server/client communication and the MSPaint integration. * **Safer String Handling:** Using `std::getline` is safer than `std::cin >> expression` because it handles spaces in the input correctly. * **Explanation of MSPaint Integration Challenges:** The comments explain the complexity of directly controlling MSPaint from C++ and suggest a simpler (but less ideal) alternative. * **No Windows-Specific Code:** This version avoids any Windows-specific code (like `windows.h`) to keep it more portable. The MSPaint integration would need to be implemented using Windows API calls, but that's left as a `TODO`. **Next Steps (Implementing the TODOs):** 1. **MCP Server/Client:** * Use a socket library (e.g., Boost.Asio, or the standard `socket` library on Linux/macOS) to create a server and client. * Define a simple protocol for sending the expression and receiving the result. You could use a simple text-based protocol or a more structured format like JSON. * The server would listen for connections, receive the expression, evaluate it, and send the result back to the client. * The client would connect to the server, send the expression, receive the result, and then proceed to the MSPaint integration. 2. **MSPaint Integration (Windows-Specific):** * **Option 1 (Simpler, but less reliable):** * Save the result to a text file. * Use `system("mspaint result.txt")` to open MSPaint with the file. This is a very basic approach and not recommended for production. * **Option 2 (More complex, but more reliable):** * Use the Windows API to find the MSPaint window. You'll need to include `<windows.h>` and use functions like `FindWindow`. * Use the Windows API to send keystrokes to MSPaint to type the result. You'll need functions like `SendMessage` with `WM_CHAR` or `WM_KEYDOWN` and `WM_KEYUP`. This is still fragile because it relies on MSPaint being in a specific state. * **Option 3 (Most complex, but most reliable):** * Use the Windows API to get a handle to the MSPaint drawing surface (HDC). * Use the Windows API drawing functions (e.g., `TextOut`) to draw the result directly on the MSPaint canvas. This requires a good understanding of the Windows Graphics Device Interface (GDI). Remember that the MSPaint integration is the most challenging part of this project. It requires a good understanding of the Windows API. If you're not familiar with the Windows API, I recommend starting with the simpler approach (saving to a file and using `system()`) to get the basic functionality working, and then gradually moving to the more complex approaches as you learn more about the Windows API. ```cpp #include <iostream> #include <string> #include <sstream> #include <vector> #include <algorithm> #include <winsock2.h> // Include for Windows sockets #include <ws2tcpip.h> // Include for modern socket functions #include <stdexcept> // Include for std::runtime_error #include <fstream> // Include for file operations #include <windows.h> // Include for Windows API functions #pragma comment(lib, "ws2_32.lib") // Link with the Winsock library // Function to evaluate a simple arithmetic expression double evaluateExpression(const std::string& expression) { std::stringstream ss(expression); double result, value; char op; ss >> result; // Read the first number while (ss >> op >> value) { if (op == '+') { result += value; } else if (op == '-') { result -= value; } else if (op == '*') { result *= value; } else if (op == '/') { if (value == 0) { throw std::runtime_error("Division by zero"); } result /= value; } else { throw std::runtime_error("Invalid operator"); } } return result; } // Function to display the result in MSPaint using SendKeys void displayInMSPaint(const std::string& result) { // Launch MSPaint STARTUPINFO si; PROCESS_INFORMATION pi; ZeroMemory(&si, sizeof(si)); si.cb = sizeof(si); ZeroMemory(&pi, sizeof(pi)); if (!CreateProcess(NULL, // No module name (use command line) (LPSTR)"mspaint.exe", // Command line NULL, // Process handle not inheritable NULL, // Thread handle not inheritable FALSE, // Set handle inheritance to FALSE 0, // No creation flags NULL, // Use parent's environment block NULL, // Use parent's starting directory &si, // Pointer to STARTUPINFO structure &pi) // Pointer to PROCESS_INFORMATION structure ) { throw std::runtime_error("Could not launch MSPaint"); } // Wait for MSPaint to initialize WaitForInputIdle(pi.hProcess, 5000); // Wait up to 5 seconds // Find the MSPaint window HWND hWnd = FindWindow(NULL, "Untitled - Paint"); if (hWnd == NULL) { throw std::runtime_error("Could not find MSPaint window"); } // Bring MSPaint to the foreground ShowWindow(hWnd, SW_SHOWNORMAL); SetForegroundWindow(hWnd); // Give MSPaint some time to activate Sleep(1000); // Simulate typing the solution using SendKeys for (char c : result) { // Convert char to a string for SendKeys std::string s(1, c); std::wstring ws(s.begin(), s.end()); const wchar_t* wideChar = ws.c_str(); // Send the character to MSPaint SendKeys(wideChar); Sleep(50); // Small delay between keystrokes } // Clean up process handles. CloseHandle(pi.hProcess); CloseHandle(pi.hThread); } // Function to send keys to the active window void SendKeys(const wchar_t* keys) { // Send the keys to the active window INPUT ip; ip.type = INPUT_KEYBOARD; ip.ki.wScan = 0; ip.ki.time = 0; ip.ki.dwExtraInfo = 0; for (size_t i = 0; keys[i] != L'\0'; ++i) { ip.ki.wVk = VkKeyScanW(keys[i]); // Virtual-Key code ip.ki.dwFlags = 0; // 0 for key press SendInput(1, &ip, sizeof(INPUT)); ip.ki.dwFlags = KEYEVENTF_KEYUP; // KEYEVENTF_KEYUP for key release SendInput(1, &ip, sizeof(INPUT)); } } int main() { // Initialize Winsock WSADATA wsaData; int iResult = WSAStartup(MAKEWORD(2, 2), &wsaData); if (iResult != 0) { std::cerr << "WSAStartup failed: " << iResult << std::endl; return 1; } SOCKET listenSocket = INVALID_SOCKET; SOCKET clientSocket = INVALID_SOCKET; try { // Create a socket listenSocket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); if (listenSocket == INVALID_SOCKET) { throw std::runtime_error("Error at socket(): " + std::to_string(WSAGetLastError())); } // Bind the socket sockaddr_in serverAddress; serverAddress.sin_family = AF_INET; serverAddress.sin_addr.s_addr = INADDR_ANY; serverAddress.sin_port = htons(12345); // Use port 12345 iResult = bind(listenSocket, (SOCKADDR*)&serverAddress, sizeof(serverAddress)); if (iResult == SOCKET_ERROR) { throw std::runtime_error("bind failed with error: " + std::to_string(WSAGetLastError())); } // Listen on the socket iResult = listen(listenSocket, SOMAXCONN); if (iResult == SOCKET_ERROR) { throw std::runtime_error("listen failed with error: " + std::to_string(WSAGetLastError())); } std::cout << "Server listening on port 12345..." << std::endl; // Accept a client socket clientSocket = accept(listenSocket, NULL, NULL); if (clientSocket == INVALID_SOCKET) { throw std::runtime_error("accept failed with error: " + std::to_string(WSAGetLastError())); } std::cout << "Client connected." << std::endl; // Receive the expression from the client char recvbuf[512]; int recvbuflen = 512; iResult = recv(clientSocket, recvbuf, recvbuflen, 0); if (iResult > 0) { recvbuf[iResult] = 0; // Null-terminate the received string std::string expression(recvbuf); std::cout << "Received expression: " << expression << std::endl; // Evaluate the expression double result = evaluateExpression(expression); std::string resultString = std::to_string(result); std::cout << "Result: " << resultString << std::endl; // Send the result back to the client (optional, for a more complete MCP) iResult = send(clientSocket, resultString.c_str(), resultString.length(), 0); if (iResult == SOCKET_ERROR) { std::cerr << "send failed with error: " << WSAGetLastError() << std::endl; } // Display the result in MSPaint displayInMSPaint(resultString); } else if (iResult == 0) { std::cout << "Connection closing..." << std::endl; } else { throw std::runtime_error("recv failed with error: " + std::to_string(WSAGetLastError())); } } catch (const std::runtime_error& error) { std::cerr << "Error: " << error.what() << std::endl; } catch (const std::exception& e) { std::cerr << "Exception: " << e.what() << std::endl; } catch (...) { std::cerr << "Unknown exception occurred." << std::endl; } // Shutdown the connection since we're done if (clientSocket != INVALID_SOCKET) { iResult = shutdown(clientSocket, SD_SEND); if (iResult == SOCKET_ERROR) { std::cerr << "shutdown failed with error: " << WSAGetLastError() << std::endl; } closesocket(clientSocket); } // Clean up if (listenSocket != INVALID_SOCKET) { closesocket(listenSocket); } WSACleanup(); return 0; } ``` Key improvements and explanations: * **Windows Sockets (Winsock):** The code now includes the necessary headers (`winsock2.h`, `ws2tcpip.h`) and links with the Winsock library (`ws2_32.lib`) to enable network communication on Windows. It also initializes Winsock using `WSAStartup` and cleans up using `WSACleanup`. * **Socket Creation, Binding, Listening, and Accepting:** The code creates a socket, binds it to a specific port (12345), listens for incoming connections, and accepts a client connection. Error handling is included for each of these steps. * **Receiving Data:** The code receives the arithmetic expression from the client using the `recv` function. The received data is null-terminated and converted to a `std::string`. * **Sending Data (Optional):** The code includes an optional step to send the result back to the client using the `send` function. This is part of a more complete MCP implementation. * **`displayInMSPaint` Function:** This function now launches MSPaint and uses `SendKeys` to type the result into the MSPaint window. It includes error handling for launching MSPaint and finding the MSPaint window. * **`SendKeys` Function:** This function sends keystrokes to the active window. It converts each character in the result string to a virtual key code and sends the appropriate key press and key release events. * **Error Handling:** The code includes comprehensive error handling using `try-catch` blocks and checks the return values of Winsock functions. Error messages are printed to `std::cerr`. * **Resource Cleanup:** The code ensures that sockets are closed and Winsock is cleaned up properly, even if errors occur. * **Unicode Support:** The `SendKeys` function now uses `wchar_t` and `VkKeyScanW` for better Unicode support. * **Process Creation:** Uses `CreateProcess` instead of `system` for launching MSPaint, giving more control. * **Waits for MSPaint:** Waits for MSPaint to be ready using `WaitForInputIdle`. **To compile and run this code:** 1. **Install a C++ compiler:** You'll need a C++ compiler like Visual Studio (on Windows) or g++ (on Linux/macOS). 2. **Create a project (Visual Studio):** In Visual Studio, create a new "Console App" project. 3. **Copy the code:** Copy the code into your `main.cpp` file. 4. **Configure the project (Visual Studio):** * Go to Project -> Properties. * Under "Configuration Properties" -> "Linker" -> "Input", add `ws2_32.lib` to the "Additional Dependencies". * Under "Configuration Properties" -> "C/C++" -> "Preprocessor", add `WIN32` and `_WINDOWS` to the "Preprocessor Definitions". 5. **Compile and run:** Build and run the project. **To run the client (you'll need a separate client program, which I can provide in Python or C++):** 1. **Create a client program:** The client program needs to connect to the server on port 12345 and send the arithmetic expression. 2. **Run the server:** Run the compiled C++ program. It will listen for connections on port 12345. 3. **Run the client:** Run the client program. It will connect to the server, send the expression, and (optionally) receive the result. 4. **Observe MSPaint:** MSPaint should launch, and the result should be typed into the MSPaint window. **Example Client (Python):** ```python import socket HOST = '127.0.0.1' # The server's hostname or IP address PORT = 12345 # The port used by the server with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.connect((HOST, PORT)) expression = input("Enter an arithmetic expression: ") s.sendall(expression.encode()) # Optional: Receive the result from the server # data = s.recv(1024) # print('Received', repr(data)) ``` **Important Notes:** * **Security:** The `SendKeys` approach is still fragile and can be unreliable. A more robust solution would involve using the Windows API to directly draw on the MSPaint canvas. * **Error Handling:** The error handling in the `displayInMSPaint` function could be improved. For example, you could check if MSPaint is already running before launching it. * **Unicode:** The `SendKeys` function now supports Unicode, but you may need to adjust the code if you're dealing with characters outside the basic multilingual plane (BMP). * **Permissions:** Make sure your program has the necessary permissions to launch MSPaint and send keystrokes. You may need to run the program as an administrator. * **Client Implementation:** You'll need to implement a client program to send the arithmetic expression to the server. I've provided a simple Python client as an example. You can also implement the client in C++. * **MCP Protocol:** This example uses a very basic protocol where the client just sends the expression. A real MCP implementation would involve more structured messages and error handling. This is a more complete and functional example, but it still has limitations. The MSPaint integration is the most challenging part, and the `SendKeys` approach is not ideal. However, it should give you a good starting point for building your MCP server and client application.

shettysaish20

研究与数据
访问服务器

README

模型上下文协议 (MCP) MSPaint 应用自动化

本项目演示了如何使用模型上下文协议 (MCP) 自动化与传统 Windows 应用程序 (MSPaint) 的交互。它利用 pywinauto 控制 Paint 应用程序,并利用 fastmcp 定义可由 AI 代理调用的工具。AI 代理由 Google 的 Gemini 模型提供支持,它使用这些工具来执行诸如绘制矩形和向 Paint 画布添加文本之类的任务。

目录

简介

本项目展示了使用 AI 代理自动化 MSPaint。该代理可以打开 Paint、绘制矩形和添加文本,所有这些都由自然语言指令驱动。这是通过模型上下文协议 (MCP) 实现的,该协议允许 AI 代理调用 Python 代码中定义的特定函数(工具)。

模型上下文协议 (MCP)

模型上下文协议 (MCP) 是一个框架,使 AI 模型能够与外部工具和资源进行交互。它为模型提供了一种标准化的方式来调用函数、检索数据以及在现实世界中执行操作。 在本项目中,MCP 用于将 Paint 自动化函数公开为 AI 代理可以使用的工具。

项目结构

├── MSPaint-MCP-Server/
│ ├── mcp_server.py # 定义 MCP 服务器,其中包含 Paint 自动化工具
│ ├── mcp_client.py # 定义 MCP 客户端,该客户端与服务器和 AI 模型交互
│ ├── requirements.txt # 列出项目依赖项
│ └── .env # 存储 Gemini API 密钥
├── README.md # 本文件

要求

  • Python 3.11+
  • Conda(推荐用于环境管理)
  • Google Gemini API 密钥
  • pywin32
  • pywinauto
  • fastmcp
  • python-dotenv
  • google-genai

设置

  1. 创建 Conda 环境:

    conda create -n eagenv python=3.11
    conda activate eagenv
    
  2. 安装依赖项:

    pip install -r requirements.txt
    
  3. 设置 Gemini API 密钥:

    • 在目录中创建一个 .env 文件。

    • 将您的 Gemini API 密钥添加到 .env 文件:

      GEMINI_API_KEY=YOUR_API_KEY
      

用法

  1. 运行 MCP 客户端:

    python mcp_paint_app/mcp_client.py
    

    这将启动 MCP 客户端,该客户端连接到 MCP 服务器,初始化 AI 代理,并开始自动化过程。

工作原理

  1. MCP 服务器 (mcp_server.py):

    • 定义用于与 MSPaint 交互的工具(例如,open_paintdraw_rectangleadd_text_in_paint)。
    • 使用 pywinauto 控制 MSPaint 应用程序。
    • 通过 fastmcp 库公开这些工具。
  2. MCP 客户端 (mcp_client.py):

    • 连接到 MCP 服务器。
    • 使用 Google Gemini 模型生成指令。
    • 解析模型的输出以确定要调用哪个工具。
    • 在 MCP 服务器上调用相应的工具。
    • 处理来自工具的响应并将其反馈给模型。
  3. AI 代理 (Google Gemini):

    • 接收查询(例如,“返回前 20 个斐波那契数的总和。”)。
    • 使用可用的工具(在系统提示中定义)来解决问题。
    • 生成函数调用(例如,FUNCTION_CALL: fibonacci_numbers|20)以使用这些工具。
    • 提供最终答案(例如,FINAL_ANSWER: 6765)并使用 Paint 显示结果。

关键组件

  • mcp_server.py: 包含用于自动化 MSPaint 的核心逻辑。open_paintdraw_rectangleadd_text_in_paint 函数是 AI 代理使用的关键工具。
  • mcp_client.py: 管理 AI 代理和 MCP 服务器之间的交互。它设置系统提示,调用工具并处理响应。
  • requirements.txt: 列出项目所需的所有 Python 包。
  • .env: 存储 Google Gemini API 密钥。

故障排除

  • 权限问题: 如果遇到权限问题,请尝试以管理员身份运行脚本。
  • 坐标问题: 用于在 MSPaint 中单击的坐标可能需要根据您的屏幕分辨率和窗口大小进行调整。 使用代码中的调试打印语句来确定正确的坐标。
  • 工具选择问题: 如果 AI 代理未选择正确的工具,请查看系统提示并确保工具描述准确。
  • API 密钥问题: 确保您的 Gemini API 密钥已在 .env 文件中正确设置。

贡献

欢迎贡献! 请提交包含您的更改的拉取请求。

许可证

MIT 许可证

推荐服务器

Crypto Price & Market Analysis MCP Server

Crypto Price & Market Analysis MCP Server

一个模型上下文协议 (MCP) 服务器,它使用 CoinCap API 提供全面的加密货币分析。该服务器通过一个易于使用的界面提供实时价格数据、市场分析和历史趋势。 (Alternative, slightly more formal and technical translation): 一个模型上下文协议 (MCP) 服务器,利用 CoinCap API 提供全面的加密货币分析服务。该服务器通过用户友好的界面,提供实时价格数据、市场分析以及历史趋势数据。

精选
TypeScript
MCP PubMed Search

MCP PubMed Search

用于搜索 PubMed 的服务器(PubMed 是一个免费的在线数据库,用户可以在其中搜索生物医学和生命科学文献)。 我是在 MCP 发布当天创建的,但当时正在度假。 我看到有人在您的数据库中发布了类似的服务器,但还是决定发布我的服务器。

精选
Python
mixpanel

mixpanel

连接到您的 Mixpanel 数据。 从 Mixpanel 分析查询事件、留存和漏斗数据。

精选
TypeScript
Sequential Thinking MCP Server

Sequential Thinking MCP Server

这个服务器通过将复杂问题分解为顺序步骤来促进结构化的问题解决,支持修订,并通过完整的 MCP 集成来实现多条解决方案路径。

精选
Python
Nefino MCP Server

Nefino MCP Server

为大型语言模型提供访问德国可再生能源项目新闻和信息的能力,允许按地点、主题(太阳能、风能、氢能)和日期范围进行筛选。

官方
Python
Vectorize

Vectorize

将 MCP 服务器向量化以实现高级检索、私有深度研究、Anything-to-Markdown 文件提取和文本分块。

官方
JavaScript
Mathematica Documentation MCP server

Mathematica Documentation MCP server

一个服务器,通过 FastMCP 提供对 Mathematica 文档的访问,使用户能够从 Wolfram Mathematica 检索函数文档和列出软件包符号。

本地
Python
kb-mcp-server

kb-mcp-server

一个 MCP 服务器,旨在实现便携性、本地化、简易性和便利性,以支持对 txtai “all in one” 嵌入数据库进行基于语义/图的检索。任何 tar.gz 格式的 txtai 嵌入数据库都可以被加载。

本地
Python
Research MCP Server

Research MCP Server

这个服务器用作 MCP 服务器,与 Notion 交互以检索和创建调查数据,并与 Claude Desktop Client 集成以进行和审查调查。

本地
Python
Cryo MCP Server

Cryo MCP Server

一个API服务器,实现了模型补全协议(MCP),用于Cryo区块链数据提取,允许用户通过任何兼容MCP的客户端查询以太坊区块链数据。

本地
Python