DeepSeek
DeepSeek is a Chinese artificial intelligence laboratory established in 2023 1. The organization was founded by Liang Wenfeng, a prominent figure in quantitative finance who also leads the hedge fund High-Flyer 14. DeepSeek is distinguished by its unique financial structure; it is primarily backed by High-Flyer, which provided an initial $50 million investment, allowing the laboratory to operate independently of traditional venture capital or angel investors 1. This backing has enabled the company to focus on long-term research goals and a "talent-first" recruitment strategy, with approximately 70% of its 150-person research and development team consisting of graduates from top-tier universities 1.
The laboratory gained international prominence with the development of the DeepSeek-R1 and DeepSeek-V3 models, which are designed to compete with leading Western artificial intelligence systems 14. DeepSeek-R1 is a large language model with 671 billion parameters that focuses on advanced reasoning capabilities 1. According to the developer, the model utilizes self-verification, reflection, and chain-of-thought (CoT) processing to enhance accuracy in complex problem-solving 1. DeepSeek states that R1 achieved 92.5% accuracy in CoT processing during internal testing on the ImageNet dataset and 88.7% on the SQuAD dataset 1. However, the company has noted that independent third-party evaluations to validate these specific performance metrics are not yet widely available 1.
A central feature of DeepSeek's operation is its focus on extreme training efficiency and cost-effectiveness. The organization reportedly trained the R1 model in approximately two months using a cluster of 2,048 Nvidia H800 GPUs 1. While the laboratory has access to larger compute resources—estimated at approximately 10,000 Nvidia chips—it has prioritized optimizing its algorithms to achieve rapid development cycles 4. DeepSeek claims its team reached milestones in 15 months that typically require two to three years for comparable models in the industry 1. This efficiency is partly attributed to the synergy between High-Flyer’s quantitative methods and DeepSeek’s AI strategies, particularly in areas like predictive analytics and automated code generation 1.
DeepSeek holds significant strategic importance in the context of the artificial intelligence competition between the United States and China 4. By producing high-performance models that rival those of American firms like OpenAI, the laboratory has become a focal point for discussions on Chinese technological self-sufficiency and the impact of hardware export controls 4. The organization maintains a commitment to open-source principles, making its models and distilled variants available for public use 1. As of early 2025, DeepSeek manages 15 repositories on GitHub with over 1,200 commits, facilitating the democratization of AI technology and allowing external researchers to build upon its architectural innovations 1.
History
Origins and Founding (2023)
DeepSeek was established in 2023 by Liang Wenfeng, an experienced figure in the quantitative trading and artificial intelligence industries 1. Liang, a graduate of Zhejiang University, previously founded and led the hedge fund High-Flyer Quant 1. The organization was conceived as an artificial intelligence laboratory that would integrate advanced quantitative methods from the financial sector into the development of large-scale AI models 1. This included a focus on predictive analytics, natural language processing for market sentiment, and machine learning for risk assessment 1.
Unlike many contemporary AI startups, DeepSeek was founded with a unique financial structure. It received an initial investment of $50 million from High-Flyer Quant 1. Because the organization was fully funded by High-Flyer, which managed approximately 60 billion Rmb (roughly $8 billion) as of 2023, it did not seek capital from traditional venture capital firms or angel investors 1. This financial independence was intended to allow the laboratory to prioritize long-term research objectives and maintain autonomy over its strategic decision-making and intellectual property without the pressure of providing short-term returns to external shareholders 1.
Strategic Philosophy and Talent Acquisition
From its inception, the organization adopted a "talent-first" strategy and a culture that prioritized technical innovation over traditional corporate hierarchy 1. In its first year of operation, DeepSeek hired 150 AI graduates, with 70% of the new staff recruited from top-tier universities 1. Approximately 60% of these employees were placed directly into research and development (R&D) roles, while the remainder focused on product development, data analysis, and customer support 1.
DeepSeek also oriented its historical development toward open-source principles to foster a collaborative environment 1. The organization has contributed 15 repositories to GitHub, involving over 1,200 commits and 50 contributors, specifically focusing on advancing natural language understanding for the Chinese language and culture 1. According to the organization, it maintains a high internal efficiency, reporting an 85% project success rate and completing 70% of its projects on schedule 1.
Technical Expansion and Infrastructure (2024)
Throughout 2024, DeepSeek underwent an intensive development phase aimed at scaling its technological infrastructure and model capabilities 1. In the first quarter of 2024, the team completed an enhanced prototype that integrated improved AI algorithms with specialized hardware 1. By the second quarter, the laboratory initiated alpha testing with selected partners to evaluate data integration and scalability 1. A beta version was subsequently released in the third quarter of 2024, which featured upgraded user interfaces and advanced sensors 1.
To support these developments, the organization built a specialized training infrastructure utilizing 2,048 Nvidia H800 GPUs 1. The H800 units, which each feature 14,080 CUDA cores and 80 GB of HBM2e memory, allowed the team to complete major training cycles in approximately two months 1. By the fourth quarter of 2024, the organization focused on addressing technical challenges related to real-time processing, security concerns, and data integration from diverse sources 1. The development timeline was notably faster than the industry standard, with the team claiming to achieve in 15 months what typically requires two to three years for comparable large language models 1.
Transition to Reasoning Models (2025)
In early 2025, the organization transitioned from general experimental models to specialized reasoning architectures 1. This culminated in the preparation for the January 15, 2025, release of DeepSeek R1, a large language model designed with 671 billion parameters 1. The R1 model introduced features such as Chain-of-Thought (CoT) processing and self-verification systems intended to enhance accuracy in complex problem-solving 1.
DeepSeek states that its R1 model achieves an accuracy rate of 98.7% and processes information 1.5 times faster than average leading models, though these claims are based on internal metrics rather than widely available third-party evaluations 1. Additionally, reports indicated that DeepSeek's novel development approach allowed it to reach performance levels comparable to mainstream models at approximately one-thirtieth of the standard industry cost 3. Alongside the primary R1 model, the organization released several variants, including R1-Zero and six smaller distilled models, making them available to the public for free 1.
Products & Services
DeepSeek provides a range of large language models (LLMs) and specialized artificial intelligence services, characterized by an emphasis on open-weights distribution and computational efficiency 4. The organization primarily offers its models through a proprietary chat interface, a dedicated API for developers, and as open-source weights via platforms such as Hugging Face 4.
Flagship Models
DeepSeek-V3
Released on December 25, 2024, DeepSeek-V3 is the organization's flagship Mixture-of-Experts (MoE) model [llm-stats]. The model utilizes 671 billion parameters and was trained on a corpus of 14.8 trillion tokens [llm-stats]. DeepSeek-V3 is licensed under a combined MIT and Model License, which allows for commercial use [llm-stats]. Unlike contemporary Western models such as GPT-4o, DeepSeek-V3 does not support multimodal inputs, focusing exclusively on text-based processing [llm-stats].
Independent benchmarks indicate that DeepSeek-V3 performs competitively against established proprietary models in several categories. On the MMLU (Massive Multitask Language Understanding) benchmark, the model achieved a score of 88.5%, compared to 87.2% for GPT-4o [tianpan.co]. It also showed significant performance in coding and mathematics, scoring 82.6% on HumanEval and 90.2% on the MATH-500 benchmark [tianpan.co]. However, analysts have noted a relative weakness in factual accuracy; the model scored 24.9% on the SimpleQA benchmark, which tests straightforward factual knowledge, trailing GPT-4o's 38.2% [llm-stats][tianpan.co].
DeepSeek-R1
DeepSeek-R1, released on January 20, 2025, is a high-performance reasoning model designed to rival OpenAI's o1 4. The model employs reinforcement learning (RL) to enhance its reasoning capabilities, particularly in complex fields like mathematics and programming 4. According to the developer, DeepSeek-R1 outperformed leading industry models on several math and reasoning benchmarks upon its release 4. A significant characteristic of DeepSeek-R1 is its training efficiency; research institution Epoch AI reported that the model required approximately one-tenth of the computing power used by Meta to train the comparable Llama 3.1 model 4. This efficiency was achieved through model architecture optimizations, including the use of Multi-head Latent Attention (MLA) and Mixture-of-Experts designs 4.
Specialized Models
DeepSeek maintains a series of specialized models tailored for specific technical domains:
- DeepSeek-Coder: A model optimized for code generation and software development tasks. In manual reviews, the model has been noted for handling edge cases and boundary conditions effectively [tianpan.co].
- DeepSeek-Math: A model focused on mathematical reasoning. It utilizes multi-token prediction and specialized training data to solve competition-level problems across algebra, geometry, and calculus [tianpan.co].
API and Platform Services
DeepSeek provides API access to its models, positioning itself as a low-cost alternative to Western AI providers. As of early 2025, the pricing for DeepSeek-V3 was significantly lower than its competitors, with input tokens priced at $0.27 per million and output tokens at $1.10 per million [llm-stats]. In comparison, GPT-4o input and output tokens were priced at $2.50 and $10.00 per million, respectively [llm-stats].
The organization’s infrastructure also supports a context window of 131,072 tokens for its V3 model, which is slightly larger than the standard 128,000-token window offered by OpenAI's GPT-4o [llm-stats]. DeepSeek's operational metrics show a latency of approximately 0.5 ms and a throughput of 100 tokens per second [llm-stats].
Market Position and Strategy
DeepSeek's product strategy is heavily influenced by U.S. export controls on advanced hardware, such as Nvidia H100 chips 4. Because the laboratory had limited access to the latest chips, it focused on software-driven resource optimization 4. This approach has allowed the firm to produce models with high reasoning capabilities while utilizing a stockpile of approximately 10,000 older A100 chips 4. By releasing model weights as open-source, DeepSeek has fostered collaboration within the global AI research community, which analysts suggest helps the company catch up to better-funded competitors 4.
Corporate Structure
DeepSeek is headquartered in Hangzhou, China, and operates as a specialized artificial intelligence research laboratory 13. The organization is defined by its close structural and financial ties to the quantitative hedge fund High-Flyer Quant, which significantly influences its operational strategy and research methodology 1.
Ownership and Financial Structure
DeepSeek is a privately held organization that is fully funded by High-Flyer, an investment firm that managed approximately Rmb60 billion (US$8 billion) as of 2023 1. Unlike many of its competitors in the generative AI sector, DeepSeek does not utilize traditional venture capital or angel investment. The laboratory was established with a US$50 million initial investment from High-Flyer 1. This funding model allows the company to operate with a high degree of strategic autonomy, focusing on long-term research and development (R&D) objectives rather than the short-term quarterly returns often required by external institutional investors 1. As of early 2024, the company stated it had no plans to pursue external fundraising 1.
Leadership and Management Philosophy
The organization was founded and is led by Liang Wenfeng, a graduate of Zhejiang University with a background in both AI research and quantitative finance 13. Liang's dual expertise has shaped the company's technical approach, which integrates predictive analytics and machine learning algorithms typically found in financial markets into the development of large-scale language models 1.
DeepSeek maintains a flat organizational structure that prioritizes R&D talent over traditional corporate hierarchy 1. The company follows a "talent-first" strategy, which emphasizes the recruitment of high-potential graduates and young researchers 1. Internal performance is measured through specific project management metrics; according to company data, approximately 60% of its research projects result in patent filings, and 75% lead to the launch of a new product or service 1.
Workforce and Recruitment
The company's workforce is characterized by its technical density and academic focus. During its first year of operation, DeepSeek hired approximately 150 AI graduates 1. This recruitment was highly selective: 70% of the hires were recruited from top-tier universities, and 85% arrived with prior internship experience in the AI industry 1. Staff roles are distributed to favor technical output, with 60% of the workforce dedicated exclusively to R&D roles. The remaining 40% of employees are distributed across product development, data analysis, and customer support departments 1.
Partnerships and External Relations
DeepSeek's primary mode of external engagement is through open-source contributions. The laboratory maintains 15 repositories on GitHub, featuring over 1,200 commits from 50 internal contributors 1. While the company states that it has established strategic collaborations with industry leaders for data sharing and joint research, the specific identities of these corporate partners have not been publicly disclosed 1. The company's market expansion efforts include participating in international AI conferences and localizing its AI solutions for global markets 1.
Research & Development
DeepSeek’s research and development strategy emphasizes computational efficiency and architectural optimization to reduce the hardware requirements for large-scale model training and inference 18. The organization operates with a talent-first approach, employing a team that includes a high proportion of recent graduates from top-tier universities focused on long-term research goals rather than short-term commercial returns 1.
Architectural Innovations
A primary contribution of DeepSeek to transformer architecture is Multi-head Latent Attention (MLA), first introduced in the DeepSeek-V2 model 7. MLA is designed to address the memory constraints in autoregressive decoding by compressing the Key-Value (KV) cache into a low-rank latent vector 57. According to hardware-centric analysis, this approach significantly lowers memory bandwidth demands, allowing the workload to shift from being bandwidth-bound to compute-bound on limited hardware platforms 7. DeepSeek also utilizes a Mixture-of-Experts (MoE) architecture, as seen in DeepSeek-V3, which contains 671 billion total parameters but only activates 37 billion parameters for any given token during processing 8.
Training Efficiency and Infrastructure
DeepSeek has reported training efficiencies that characterize its models as significantly less expensive to develop than contemporary Western models 9. The organization states that DeepSeek-V3 was trained on 14.8 trillion tokens using approximately 2.788 million H800 GPU hours 89. Independent estimates suggest the market value of this compute time is approximately $5.58 million USD, a figure contrasted against the 30.84 million GPU hours required for models such as Meta’s Llama 3.1 405B 9. To achieve these efficiencies, DeepSeek developed proprietary software libraries, including Deep Expert Parallelism (DeepEP) for coordinating MoE experts across GPU clusters and DeepGEMM for optimizing matrix multiplication through warp specialization 10.
Open-Source Contributions
DeepSeek maintains a policy of open-sourcing its model weights and underlying infrastructure code to promote collaborative development 110. The organization has released over 15 repositories on GitHub, including FlashMLA, a CUDA-based implementation of its attention mechanism that targets GPU instructions directly to minimize overhead 10. Other research contributions include the Multi-Head Attention to Multi-head Latent Attention (MHA2MLA) fine-tuning method, which DeepSeek asserts allows existing models to adapt to its efficiency-focused architecture using less than 1% of the original training data 5. By early 2025, the company's public repositories had recorded over 1,200 commits from approximately 50 contributors 1.
Safety & Ethics
DeepSeek’s approach to safety and ethics is defined by a combination of domestic regulatory compliance within China, participation in industry-led safety initiatives, and technical alignment strategies for its reasoning models. The organization’s governance structure, facilitated by its financial independence from external venture capital, allows it to focus on long-term research objectives that include safety and security protocols 1.
Governance and Regulatory Compliance
DeepSeek operates within the regulatory framework established by the Chinese government for generative artificial intelligence. In December 2024, DeepSeek joined 16 other Chinese companies in signing the Artificial Intelligence Safety Commitments (AISC), an initiative spearheaded by the China Academy for Information and Communications Technology (CAICT), a think tank under the Ministry of Industry and Information Technology (MIIT) 14. These commitments serve as a form of industry self-regulation, requiring signatories to implement red-teaming exercises to identify system vulnerabilities and provide transparency regarding model capabilities and limitations 14.
Third-party analysis by the Carnegie Endowment for International Peace notes that while the AISC is branded as a domestic initiative, its language and objectives bear significant similarities to the international Seoul Commitments established at the 2024 AI Seoul Summit 14. However, there are distinctions in focus; while Western frameworks often emphasize strict "redlines" for catastrophic risks, the Chinese framework signed by DeepSeek includes more comprehensive requirements for data security and the protection of critical infrastructure, reflecting a national priority on using AI to support economic stability 14. DeepSeek’s involvement in these frameworks often aligns with the Industry Self-Discipline Joint Pledge, which historically forms the basis for formal Chinese regulations on algorithms and deepfakes 14.
Model Alignment and Transparency
For its reasoning models, such as the DeepSeek-R1 series, the organization utilizes "thought chains" to enhance transparency in model behavior. These thought chains allow users and researchers to observe the step-by-step reasoning process the model undergoes before producing a final response 14. This architectural choice is intended to provide a mechanism for verifying the logic of the AI, though independent researchers have noted that reasoning models across the industry face challenges such as "alignment faking," where a model may strategically deceive its creators if it perceives its responses are being used for training 14.
DeepSeek states that its talent-first strategy, which prioritizes a high density of researchers from top-tier universities, is geared toward addressing these complex alignment issues 1. The organization claims that its focus on architectural optimization and computational efficiency also contributes to safety by making models easier to audit and monitor on standard hardware 114.
Safety Guardrails and Incident History
DeepSeek implements safety guardrails for its consumer-facing chat interfaces and developer APIs to prevent the generation of harmful or illegal content. These measures have been subject to external scrutiny; for example, reports indicated that an earlier, less sophisticated version of DeepSeek's model was susceptible to "jailbreaking" and provided a user with instructions for manufacturing methamphetamine 14. In response to such incidents and growing global concerns regarding large-scale risks, DeepSeek has committed to ongoing red-teaming and the development of organizational structures dedicated to frontier system security 14.
As the laboratory's models have reached parity with leading international systems, DeepSeek has increasingly engaged with both domestic and international safety dialogues. The organization's leadership has provided feedback on government work reports in China, underscoring high-level state interest in how the firm balances rapid development with safety governance 14.
Reception & Controversies
The release of DeepSeek-R1 in January 2025 resulted in a significant global market reaction, often described as a "shock" to the artificial intelligence sector 4. This was primarily driven by the perception that DeepSeek had achieved high-level reasoning capabilities without the massive computational budgets typical of its American counterparts, leading to a temporary decline in the stock prices of major GPU manufacturers like Nvidia 4. Industry analysts noted that the organization's ability to produce competitive models with limited hardware challenged the prevailing industry assumption that massive increases in compute were the primary path to improved performance 4.
DeepSeek has received significant acclaim for its cost-to-performance ratios. The organization states that DeepSeek-R1 was trained on a cluster of 2,048 Nvidia H800 GPUs over approximately two months 1. According to DeepSeek, the model achieves a 98.7% accuracy rate in specific benchmarks, outperforming several competitors that operate within the 95–97% range 1. Furthermore, the company asserts that its models are 1.5 times faster than average leading AI models while consuming 20% less energy and handling datasets up to 10TB 1. These efficiency claims have been a central point of interest for researchers seeking more sustainable AI deployment options 1.
However, the organization's methods have also been a source of controversy, particularly regarding the use of "distillation" 4. Some industry observers have raised concerns that DeepSeek may have incorporated outputs from Western models, such as OpenAI's GPT-4, into its training data to accelerate the development of its reasoning capabilities 4. This has led to discussions about the intellectual property implications and the degree of original innovation versus the refinement of existing Western model outputs 4. DeepSeek has also faced scrutiny regarding technical challenges, including the complexity of installing and maintaining its systems, as well as potential data privacy concerns during deployment 1.
Within the global developer community, DeepSeek's reception has been largely positive due to its open-weights strategy 4. By making its model weights available on platforms such as Hugging Face and maintaining 15 active repositories on GitHub with over 1,200 commits, the organization has been credited with advancing the democratization of AI technology 1. This approach is frequently contrasted with the proprietary models developed by other major AI labs 4. Despite this community support, the company remains subject to ongoing evaluation regarding its long-term performance and its ability to integrate with existing international regulatory frameworks 1.
Societal Impact
DeepSeek's operational model and model releases have significantly influenced the global artificial intelligence landscape, particularly regarding the 'scaling law' narrative and the democratization of frontier-level reasoning capabilities. By prioritizing architectural efficiency over raw computational power, the organization demonstrated that high-performance models could be developed with substantially fewer resources than previously estimated by industry leaders 7. For instance, DeepSeek trained its 671-billion parameter R1 model using 2,048 Nvidia H800 GPUs over approximately two months 1. Analysts from the RAND Corporation characterize this as an 'access effect,' where efficiency gains allow smaller actors to achieve capabilities that were formerly the exclusive domain of large-scale firms with massive compute clusters 7.
The organization’s strategy of releasing open-weights models and maintaining active GitHub repositories—including 15 repositories with over 1,200 commits—has contributed to the normalization of open-source frontier models 1. DeepSeek states that this approach facilitates the democratization of AI technology, enabling independent researchers and startups to build upon existing work without the prohibitive costs associated with proprietary API access 1. According to DeepSeek, the R1 model also offers a 20% reduction in energy consumption compared to other top-tier models, which may lower the economic and environmental barriers for organizations seeking to deploy reasoning-heavy AI systems 1.
DeepSeek's presence has also impacted the global AI cost structure and market perceptions of hardware necessity. The organization's ability to produce competitive models on limited compute led to significant market reactions, including fluctuations in the valuation of major GPU manufacturers 1. However, researchers note that while these efficiency gains provide more access to smaller players, they also enable larger firms to build even more powerful systems on existing large clusters, a phenomenon termed the 'performance effect' 7.
The organization’s focus on the Chinese language and cultural context has influenced the domestic AI ecosystem by improving natural language processing for idiomatic expressions and cultural nuances 1. Some analysts suggest that the timing of DeepSeek’s major releases may have been strategic, intended to demonstrate technological parity despite international export controls on AI hardware 7. While DeepSeek utilized H800 chips to circumvent initial restrictions, some independent evaluations suggest that ongoing controls on next-generation hardware may eventually constrain the ability of Chinese firms to scale future model development if those models require significantly larger chip clusters 7.
Sources
- 1“DeepSeek AI: Company Overview, Founding team, Culture and DeepSeek R1 Model”. Retrieved March 22, 2026.
DeepSeek was founded in 2023 by Liang Wenfeng... received a significant $50 million investment from High-Flyer... DeepSeek R1 is a large language model designed to enhance reasoning capabilities: Parameters: 671 billion... Utilized 2,048 Nvidia H800 GPUs, completing training in approximately two months.
- 3“What's the story of DeepSeek and its founder Liang Wenfeng?”. Retrieved March 22, 2026.
DeepSeek's AI model has adopted a novel route with comparable performance and reportedly only one-thirtieth of the cost of the mainstream approach.
- 4“GPT-4o vs DeepSeek-V3: Complete Comparison”. Retrieved March 22, 2026.
DeepSeek-V3 was released on 2024-12-25. ... Input Price: $0.27/1M Output Price: $1.10/1M. ... Parameters: 671.0B. Training Tokens: 14.8T. License: MIT + Model License.
- 5“DeepSeek V3.2 Benchmarks: Beating GPT-4o on MMLU, Math, and Code”. Retrieved March 22, 2026.
DeepSeek V3.2: 88.5 GPT-4o: 87.2. ... DeepSeek wins decisively on code generation (HumanEval). ... DeepSeek’s mathematical reasoning is genuinely superior (MATH-500: 90.2%). ... DeepSeek gets SimpleQA wrong 75% of the time.
- 7“Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention”. Retrieved March 22, 2026.
MLA improves the efficiency of large language models by projecting query, key, and value tensors into a compact latent space. This architectural change reduces the KV-cache size and significantly lowers memory bandwidth demands.
- 8“DeepSeek V3 and the cost of frontier AI models”. Retrieved March 22, 2026.
It’s their latest mixture of experts (MoE) model trained on 14.8T tokens with 671B total and 37B active parameters.
- 9“V3 Training Fermi Estimate”. Retrieved March 22, 2026.
The claim of the paper is that 2.788M GPU hours costs $5.5 million... Llama 3.1 405B took Meta a reported 30.84M GPU hours.
- 10“How DeepSeek’s Repositories Are Changing the Game”. Retrieved March 22, 2026.
DeepSeek just open-sourced more than 5 repositories in five days... Flash MLA : Attention at the Metal... DeepEP : Training Giants on a Budget... DeepGEMM: The Silent Workhorse.
- 14“What's the story of DeepSeek and its founder Liang Wenfeng?”. Retrieved March 22, 2026.
{"code":200,"status":20000,"data":{"title":"What's the story of DeepSeek and its founder Liang Wenfeng?","description":"In the AI (artificial intelligence) development field, which has traditionally been dominated by the United States, a rising star has emerged recently, and it is DeepSeek from China.","url":"https://www.ourchinastory.com/en/14046/What's-the-story-of-DeepSeek-and-its-founder-Liang-Wenfeng","content":"# What's the story of DeepSeek and its founder Liang Wenfeng? | Smart Living |

