Day50：2026年3月18日打卡

news2026/3/19 8:51:04

一、上机打卡1.1 回形取数1.1.1 题目回形取数就是沿矩阵的边取数若当前方向上无数可取或已经取过则左转90度。一开始位于矩阵左上角方向向下。输入说明输入第一行是两个不超过200的正整数m, n表示矩阵的行和列。接下来m行每行n个整数表示这个矩阵。输出说明输出只有一行共mn个数为输入矩阵回形取数得到的结果。数之间用一个空格分隔行末不要有多余的空格。输入样例4 31 2 34 5 67 8 910 11 12输出样例1 4 7 10 11 12 9 6 3 2 5 81.1.2 总结定义四个变量表示当前输出的上、下、左、右边界。定义变量count表示当前输出数字的个数。按照下右上左的顺序访问矩阵。向下遍历列恒为left行的范围为top到bottom下方向遍历完后left要自增一次向左遍历行恒为bottom列的范围为left到right右方向遍历完后bottom要自减一次注意是减向上遍历列恒为right行的范围为bottom到top上方向遍历完后right要自减一次向右遍历行恒为top列的范围为right到left上方向遍历完后top要自增一次注意是增。踩过的坑1. 注意每次确定下一个遍历位置之前都要判断count的值程序继续进行下次探测的条件是count total, 而不是count total, 因为满足条件之后程序会多探测一次如果条件为小于等于的话最终会探测到total1个数。2. 右方向探测完后是bottom--而不是bottom左方向探测完后是top而不是top-- 。1.1.3 代码#include iostream #include algorithm #include vector using namespace std; int metrix[200][200] { 0 }; int main() { int m, n; cin m n; int step1 m - 1; int step2 n - 1; for (int i 0; i m; i) { for (int j 0; j n; j) { cin metrix[i][j]; } } int top 0, bottom m - 1; int left 0, right n - 1; int count 0; int total m * n; while (count m * n) { //这里以及下面都是小于小于等于的话会多输出一个数 for (int i top; i bottom count total; i) { if (count 0) cout ; cout metrix[i][left]; count; } left; for (int j left; j right count total; j) { if (count 0) cout ; cout metrix[bottom][j];; count; } bottom--; for (int i bottom; i top count total; i--) { if (count 0) cout ; cout metrix[i][right];; count; } right--; for (int j right; j left count total; j--) { if (count 0) cout ; cout metrix[top][j];; count; } top; } return 0; }二、翻译打卡原文Reinforcement learning is a machine learning approach that learns optimal strategies through interaction with the environment. In the reinforcement learning framework, an agentobserves the state of the environmentand takes corresponding actions in order to receiverewards or penalties. The goal of the agent is to find a policy that maximizes long-term cumulative rewards through continuous exploration and learning. Unlike supervised learning, reinforcement learning usually does not rely on large amounts of labeled data but improves decision-making ability through trial and error. Reinforcement learning has achieved success in many complex tasks such as robotic control, autonomous driving, andgame artificial intelligence. In the famous Go program AlphaGo, reinforcement learning was combined with deep neural networks, enabling computers to reach or even surpass the level of top human players. However, in practical applications, reinforcement learning still faces challenges such as low sample efficiency and high training costs.译文强化学习是一种机器学习的方法它能够通过与环境互动来学习到最佳的策略。在强化学习框架中代理通过观察环境的阶段和做出相应的回应来收到奖赏或者专利。代理的目标是通过持续的探索和学习来找到一个最大化长期积累的奖励。不像监督学习强化学习通常不依靠大量的被打好标签的数据集但是通过实验和犯错误来改进决策能力。强化学习已经在许多复杂的任务上取得了成功例如机器人控制自动驾驶以及ai游戏。在有名的Go程序AlphaGo的开发中强化学习与深度神经网络相结合使电脑能够达到甚至超过顶尖的人脸比赛者。然而在实际应用中强化学习始终面临着一些挑战例如低样本效率以及高训练成本。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/2425833.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！